[Bug]: OpenCLIPEmbeddingFunction() function is not working without openai api key in Chroma Multi-Modal llamaindex Demo

saimunikoti commented 10 months ago

Bug Description

This is about the demo at https://docs.llamaindex.ai/en/stable/examples/multi_modal/ChromaMultiModalDemo.html OpenCLIPEmbeddingFunction() is a open source model from open_clip which doesn't require openai api key. OpenCLIPEmbeddingFunction is used to create vector database (chroma_collection = chroma_client.create_collection( "multimodal_collection", embedding_function=OpenCLIPEmbeddingFunction(), data_loader=image_loader, ). But it is using default service_context when index is defined: (index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, ) which implies it is invoking openai models for query embedding. Ideally your embedding model for vector database creation and for retrieval (i.e., for query embedding) should be same, i.e. OpenCLIPEmbeddingFunction(). The service _context in the index should take OpenCLIPEmbeddingFunction() as embed_model which is currently not possible.

Version

0.9.27

Steps to Reproduce

Just follow the steps in the llamaindex tutorial page: https://docs.llamaindex.ai/en/stable/examples/multi_modal/ChromaMultiModalDemo.html

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 10 months ago

🤖

Hello @saimunikoti 👋

I'm Dosu, a friendly bot here to assist you with your LlamaIndex issues while we wait for a human maintainer. I can help you troubleshoot bugs, answer your queries, and guide you on becoming a contributor.

Thank you for bringing this bug to our attention. I am currently reviewing your issue regarding the OpenCLIPEmbeddingFunction() function and will provide a comprehensive answer shortly. Sit tight!

logan-markewich commented 10 months ago

@saimunikoti actually, with multimodal, clip sucks at text-to-text retrieval.

What is happening here is, images are embedded and retrieved with clip, and text is embedded and retrieved with whatever other embedding model (openai in this case)

@hatianzhang is our multimodal guy, he can maybe confirm this 😁

dosubot[bot] commented 10 months ago

🤖

Hey there, it sounds like you're discussing some interesting topics related to multimodal retrieval and the performance of different models for text-to-text retrieval. If you have any specific questions or need assistance with anything related to this, feel free to ask! I'm here to help.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

saimunikoti commented 10 months ago

@saimunikoti actually, with multimodal, clip sucks at text-to-text retrieval.

What is happening here is, images are embedded and retrieved with clip, and text is embedded and retrieved with whatever other embedding model (openai in this case)

@hatianzhang is our multimodal guy, he can maybe confirm this 😁

Thanks for responding. But even when we are trying to use different embedding models for text (i.e., BAAI/bge-large-en) and image (openclipembedding), it seems to not working. Its tightly coupled with openai api functions. Is there any workaround to skip openai functions and use open source models like BGE and openclipembedding ?

logan-markewich commented 10 months ago

@saimunikoti Yea, just define the service context with the embedding model you want right?

service_context = ServiceContext.from_defaults(llm=llm, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5"))
index =  VectorStoreIndex.from_documents(...., service_context=service_context)

dosubot[bot] commented 7 months ago

Hi, @saimunikoti,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue "OpenCLIPEmbeddingFunction() not working without openai api key" was reported by you. It seems that the problem arises from the default service_context being used when defining the index, conflicting with the intended use of OpenCLIPEmbeddingFunction(). Initial responses were provided by dosubot and logan-markewich, with logan-markewich suggesting a workaround by defining the service context with the desired embedding model. Further discussion is ongoing to address the issue.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.

run-llama / llama_index