Closed saimunikoti closed 7 months ago
π€
Hello @saimunikoti π
I'm Dosu, a friendly bot here to assist you with your LlamaIndex issues while we wait for a human maintainer. I can help you troubleshoot bugs, answer your queries, and guide you on becoming a contributor.
Thank you for bringing this bug to our attention. I am currently reviewing your issue regarding the OpenCLIPEmbeddingFunction() function and will provide a comprehensive answer shortly. Sit tight!
@saimunikoti actually, with multimodal, clip sucks at text-to-text retrieval.
What is happening here is, images are embedded and retrieved with clip, and text is embedded and retrieved with whatever other embedding model (openai in this case)
@hatianzhang is our multimodal guy, he can maybe confirm this π
π€
Hey there, it sounds like you're discussing some interesting topics related to multimodal retrieval and the performance of different models for text-to-text retrieval. If you have any specific questions or need assistance with anything related to this, feel free to ask! I'm here to help.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@saimunikoti actually, with multimodal, clip sucks at text-to-text retrieval.
What is happening here is, images are embedded and retrieved with clip, and text is embedded and retrieved with whatever other embedding model (openai in this case)
@hatianzhang is our multimodal guy, he can maybe confirm this π
Thanks for responding. But even when we are trying to use different embedding models for text (i.e., BAAI/bge-large-en) and image (openclipembedding), it seems to not working. Its tightly coupled with openai api functions. Is there any workaround to skip openai functions and use open source models like BGE and openclipembedding ?
@saimunikoti Yea, just define the service context with the embedding model you want right?
service_context = ServiceContext.from_defaults(llm=llm, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5"))
index = VectorStoreIndex.from_documents(...., service_context=service_context)
Hi, @saimunikoti,
I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue "OpenCLIPEmbeddingFunction() not working without openai api key" was reported by you. It seems that the problem arises from the default service_context being used when defining the index, conflicting with the intended use of OpenCLIPEmbeddingFunction(). Initial responses were provided by dosubot and logan-markewich, with logan-markewich suggesting a workaround by defining the service context with the desired embedding model. Further discussion is ongoing to address the issue.
Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.
Bug Description
This is about the demo at https://docs.llamaindex.ai/en/stable/examples/multi_modal/ChromaMultiModalDemo.html OpenCLIPEmbeddingFunction() is a open source model from open_clip which doesn't require openai api key. OpenCLIPEmbeddingFunction is used to create vector database (chroma_collection = chroma_client.create_collection( "multimodal_collection", embedding_function=OpenCLIPEmbeddingFunction(), data_loader=image_loader, ). But it is using default service_context when index is defined: (index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, ) which implies it is invoking openai models for query embedding. Ideally your embedding model for vector database creation and for retrieval (i.e., for query embedding) should be same, i.e. OpenCLIPEmbeddingFunction(). The service _context in the index should take OpenCLIPEmbeddingFunction() as embed_model which is currently not possible.
Version
0.9.27
Steps to Reproduce
Just follow the steps in the llamaindex tutorial page: https://docs.llamaindex.ai/en/stable/examples/multi_modal/ChromaMultiModalDemo.html
Relevant Logs/Tracbacks
No response