run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.38k stars 4.67k forks source link

[Feature Request]: remove model dependency from servicecontext when vector store exists #10197

Closed Morriz closed 5 months ago

Morriz commented 5 months ago

Feature Description

Hi, creating stores is expensive and only needs a model during generation. So why do the docs say we need to include the same model used for creation during load of an existing db?

I would like to rewrite the following working code (snippet):

...
    embed_model = HuggingFaceEmbedding(
        model_name="BAAI/bge-base-en-v1.5", max_length=512
    )
    service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=None)

    if exists:
        vector_store = FaissVectorStore.from_persist_dir(persist_dir)
        storage_context = StorageContext.from_defaults(
            vector_store=vector_store,
            persist_dir=persist_dir,
        )
        index = load_index_from_storage(
            storage_context=storage_context, service_context=service_context
        )
    else:
        dim = 768
        faiss_index = faiss.IndexFlatL2(dim)
        vector_store = FaissVectorStore(faiss_index=faiss_index)
        storage_context = StorageContext.from_defaults(vector_store=vector_store)
        documents = _get_documents()
        index = VectorStoreIndex.from_documents(
            documents, storage_context=storage_context, service_context=service_context
        )
        index.storage_context.persist(persist_dir=persist_dir)

to something like this in order to NOT have to retrieve large models when the db is created (I will package the DB with git or prep it some other way, negating the need for any embedding model):

    dim = 768
    if exists:
        vector_store = FaissVectorStore.from_persist_dir(persist_dir)
        storage_context = StorageContext.from_defaults(
            vector_store=vector_store,
            persist_dir=persist_dir,
        )
        index = load_index_from_storage(storage_context=storage_context, dim=dim)
    else:
        embed_model = HuggingFaceEmbedding(
            model_name="BAAI/bge-base-en-v1.5", max_length=512
        )
        service_context = ServiceContext.from_defaults(
            embed_model=embed_model, llm=None
        )
        faiss_index = faiss.IndexFlatL2(dim)
        vector_store = FaissVectorStore(faiss_index=faiss_index)
        storage_context = StorageContext.from_defaults(vector_store=vector_store)
        documents = _get_documents()
        index = VectorStoreIndex.from_documents(
            documents, storage_context=storage_context, service_context=service_context
        )
        index.storage_context.persist(persist_dir=persist_dir)

I assume the only relevant fact to know up front is the dimension of the vectors, right?

Reason

no unnecessary load

Value of Feature

it avoids downloading large models

dosubot[bot] commented 5 months ago

πŸ€–

Hey @Morriz! πŸ‘‹ Good to see you around here again. I'm diving into your feature request right now. I'll get back to you with a comprehensive response shortly. Hang tight! πŸ™

logan-markewich commented 5 months ago

If you don't load the same embedding model for both creation and querying, how will querying work? You need to embed the query string in order to retrieve

Morriz commented 5 months ago

What do you mean by "query string"? This is not about a string but about loading a model that is not used during retrieval. My intuition says the data is in the db and the retriever does not need a model to get that out.

logan-markewich commented 5 months ago

@Morriz vector dbs work by mainly semantic search. You embed all your data.

Then you embed a query string, and retrieve the top-k most similar.

Morriz commented 5 months ago

Indeed. But you forgot the import part: all the vectors are in the db as well. Everything you need is in there. That is my point. I made many rag apps btw. Telling me I need a search string for rag is not very helpful here...

Morriz commented 5 months ago

I hope you understand I only gave the relevant part of code here, like I mentioned in the issue.

dosubot[bot] commented 5 months ago

πŸ€–

I understand your concern. It's important to be mindful of sharing only the relevant parts of the code to maintain security and privacy. If you have any specific questions or need assistance with the code snippet you provided, feel free to ask and I'll be happy to help!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 5 months ago

@Morriz just trying to understand the issue πŸ˜… how will you use your vector db when performing semantic search? Can you explain how it should work then? I apologize if I'm missing something obvious here.

In my mind, once you have your vector db, you still need to generate query embeddings to search and retrieve a top k? Unless you are performing search with data that is already in the vector db somehow?

In any case, if you truly don't need an embedding model, you can disable it (and the llm) by using

service_context = ServiceContext.from_defualts(llm=None, embed_model=None)
index = load_index_from_storage(storage_context, service_context=service_context)

This will load "fake" models. But again, you won't be able to properly retrieve from the index, unless you had a vector pre-computed to do top-k search.

jerryjliu commented 5 months ago

@Morriz the way you use a vector db is that you would need to specify a query embedding, and the vector db would return the top-k most similar embeddings/documents to the query embedding.

By default, we compute the embedding for you with our vector store interface, and to do that you would need an embedding model (per @logan-markewich 's point). The other approach is that you directly provide the embedding as part of the user query, so you don't need to give us an embedding model to compute the query embedding for you.

Is the latter approach what you have in mind? That's also possible too (i think). And i think your code snippet should work too

Morriz commented 5 months ago

Thanks for the elaborate reply. And thanks for trying to educate me @logan-markewich (I previously thought the query to vector for search input would be created up front without using a model, but that was obviously a gap in my knowledge, and the result of overusing openai). I don't think it makes sense for me to offload vector creation to another step though. I guess I have to get used to the implications of using large local models.

logan-markewich commented 5 months ago

Yea no worries, glad it makes sense πŸ‘πŸ»

If you wanted, you could host embeddings on some dedicated server (i.e. using Text Embedding Inference from Huggingface, or another hosting option), so that the model is not hosted directly on your local machine.

You can see a list of all embedding integrations here, which some run locally, or some over an API, or you can even implement your own custom class https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#list-of-supported-embeddings