run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.33k stars 4.67k forks source link

[Question]: Can llama-index milvus vector store use Milvus'es built-in embedding models that they provide instead of using openai for embedding? #13421

Open mballav opened 1 month ago

mballav commented 1 month ago

Question Validation

Question

We are considering using Milvus on the cloud (zilliz.com) as a managed solution for storing and querying our vectors. Right now, if we use Open AI to get our documents vectorized, it takes a long time if there are more than say 10 documents.

Also, we are using llama-index (over langchain) for our RAG application.

All the examples I found, that showcase use of Milvus vector store, use Open AI for embedding. Is there a way to use Milvus'es built-in local models for embedding which will likely speed up the document uploads.

Any pointers or any alternative solution will be greatly appreciated.

dosubot[bot] commented 1 month ago

Hey there, @mballav! 🎉 I'm Dosu, your friendly neighborhood bot, here to help you tackle bugs, answer questions, and guide you on your journey to becoming a super contributor while we wait for a human maintainer to chime in. Excited to dive into your issue about using Milvus' built-in embedding models with LlamaIndex. Give me a sec to gather some info and I'll get back to you with what I find!

logan-markewich commented 1 month ago

You can't use milvus's built-in embeddings, but you can

mballav commented 1 month ago

@logan-markewich, Thanks a lot for the prompt reply.

Looking at the options that you mentioned, option #3 is appealing, though even zilliz managed index requires openai key. I am assuming that zilliz under the hood vectorizes a document using OpenAI's text-ada model. If we go that route, do you think that their pipeline would be any faster than if we ourselves vectorized using OpenAI?

dosubot[bot] commented 1 month ago

The Zilliz managed index pipeline, designed for use with LlamaIndex, does indeed necessitate an OpenAI key, indicating its reliance on OpenAI's services for tasks such as generating embeddings. While it's not explicitly confirmed that it uses OpenAI's text-ada model for vectorization, the managed nature of Zilliz's service suggests potential optimizations and features that could lead to faster document vectorization compared to direct usage of OpenAI. These include managed service integration, pipeline optimization, and automatic management features that are tailored for efficiency.

To accurately assess whether Zilliz's pipeline offers a speed advantage over direct OpenAI vectorization, it would be beneficial to conduct performance benchmarking specific to your use case. The LlamaIndex repository provides tools for such benchmarking, which could serve as a basis for your tests.

For detailed insights and potential performance benchmarks, you might explore the following resources within the LlamaIndex repository:

This approach will help you determine the potential speed improvements of using the Zilliz managed index pipeline over a direct, self-managed implementation with OpenAI.

#### Details **Sources** - [llama-index-core/llama_index/core/indices/vector_store/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/vector_store/base.py) - [llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py) - [llama-index-integrations/indices/llama-index-indices-managed-zilliz/llama_index/indices/managed/zilliz/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/indices/llama-index-indices-managed-zilliz/llama_index/indices/managed/zilliz/base.py) - [llama-index-integrations/indices/llama-index-indices-managed-zilliz/llama_index/indices/managed/zilliz/retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/indices/llama-index-indices-managed-zilliz/llama_index/indices/managed/zilliz/retriever.py) - [llama-index-legacy/llama_index/legacy/indices/managed/zilliz/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/managed/zilliz/base.py) - [llama-index-legacy/llama_index/legacy/vector_stores/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/vector_stores/__init__.py) - [docs/docs/module_guides/indexing/vector_store_index.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/indexing/vector_store_index.md) - [llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/utils.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/utils.py) - [llama-datasets/docugami_kg_rag/sec_10_q/llamaindex_baseline.py](https://github.com/run-llama/llama_index/blob/main/llama-datasets/docugami_kg_rag/sec_10_q/llamaindex_baseline.py) - [llama-datasets/paul_graham_essay/llamaindex_baseline.py](https://github.com/run-llama/llama_index/blob/main/llama-datasets/paul_graham_essay/llamaindex_baseline.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

wxywb commented 1 month ago

@logan-markewich, Thanks a lot for the prompt reply.

Looking at the options that you mentioned, option #3 is appealing, though even zilliz managed index requires openai key. I am assuming that zilliz under the hood vectorizes a document using OpenAI's text-ada model. If we go that route, do you think that their pipeline would be any faster than if we ourselves vectorized using OpenAI?

@mballav zilliz managed index can use a hosted version of bge models, checkout this for all supported models.