jina-ai / clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
https://clip-as-service.jina.ai
Other
12.38k stars 2.06k forks source link

quick question reg approach #762

Closed sjcotto closed 2 years ago

sjcotto commented 2 years ago

Hi, i guss this is a comon case, if we have structured data like product name, category and the image url, the recommendation would be generate one vector per field? like

1 vector for product name 1 vector for category 1 vector for imageURL

and store the 3.

or it's better to do some kind of concatention?

i was looking into the example https://github.com/jina-ai/example-multimodal-fashion-search but it's not clear to me how it is done

thanks in advance

ZiniuYu commented 2 years ago

Hello @sjcotto , glad to see you here! You can utilize the capability of DocArray to store such information. For example, you can store product name, category to different tags of a document and imageurl to uri of that document (refer). You can also create a more complex structure by putting things in the chunks or matches (refer). Just do some experiments and feel the power of DocArray!

sjcotto commented 2 years ago

thanks a lot!

just to confirm, doing that it will index text (json fields tags) + image into the vector and doing so the search will improve?

ZiniuYu commented 2 years ago

Hi @sjcotto ! There are different ways to generate vector embeddings:

  1. You can manually assign vector embedding here.
  2. You can use Clip-as-Service to generate vector embedding using Clip models here.
  3. You can use different encoders from Jina Hub.

You may then follow the example you found to create a search application.🚀 Remember to take a look at our latest product Jina Now to build neural search in just one command execution 👀👀👀

sjcotto commented 2 years ago

Thanks.

I was trying to use the clip-as-service only to understand.

For some reason the image to image search works perfectly but he text to image don't, i need to do more research for sure.

the save to milvus

    c = Client(_CLIP_AS_SERVICE)

    d = Document(uri=item.uri,tags=item.data)
    da = DocumentArray()
    da.append(d)

    r = await c.aencode( da )
    vector = r.embeddings

    data = [
        [item.id],
        [vector.tolist()[0]],
    ]

    collection = Collection(item.index)
    collection.load()    
    collection.insert(data)

the search

c = Client(_CLIP_AS_SERVICE)
    d = Document()

    if item_input.text is not None:
        d.text=item_input.text
    if item_input.uri is not None:
        d.uri=item_input.uri

    da = DocumentArray()
    da.append(d)

    r = await c.aencode( da )
    vector = r.embeddings
    print(vector)

    collection = Collection(item_input.index) 
    collection.load()

    search_params = {"metric_type": "L2"}
    results = collection.search(vector, anns_field=_VECTOR_FIELD_NAME, param=search_params, limit=5 )

tests
![image](https://user-images.githubusercontent.com/3880470/177336179-deb993f5-41dd-42c8-883b-6daa79990c84.png)

i did many tests putting the text as literal what is in image and looks the replies are random, i'm using the same dataset used in jina fashion example

![image](https://user-images.githubusercontent.com/3880470/177336319-45145aed-ba75-438e-a11a-f38f44f0c497.png)
ZiniuYu commented 2 years ago

Hi @sjcotto , could you provide more details like the error messages or the unexpected behaviors?

sjcotto commented 2 years ago

Hi, thk for quick reply.

not receiving an error but just random search results, i'm trying the same examples in the jina fashion example for no luck

do you know if there is something odd in the code above?

sjcotto commented 2 years ago

maybe the problem is related to narray vs list? i notice that in order to store in milvus i need to transform to list ([vector.tolist()[0]])

numb3r3 commented 2 years ago

@sjcotto maybe the metric search_params = {"metric_type": "L2"} chosen here does not fit with clip model. Usually, we use cosine as the distance metric, cause CLIP is trained with cosine distance.

numb3r3 commented 2 years ago

we will close this issue for now. If you have some findings to share, you are welcome to share with community. Thanks!