run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
34.75k stars 4.9k forks source link

How to deploy open-source embedding models in auto-merging retriever: ValueError: shapes (1024,) and (384,) not aligned: 1024 (dim 0) != 384 (dim 0) #14784

Open Sajad7010 opened 1 month ago

Sajad7010 commented 1 month ago

Question Validation

Question

I want to use open-source embedding models like BAAI/bge-m3, in automerging retrieve. I run the following code, but the embedding output is not compatible with the input size of the query engine. The error is: ValueError: shapes (1024,) and (384,) not aligned: 1024 (dim 0) != 384 (dim 0) The code works when I use a smaller model like BAAI/bge-small-en-v1.5. Can someone explain how I would be able to adjust different embedding models to the query engine?

import os
from llama_index.core import Settings, StorageContext, VectorStoreIndex, load_index_from_storage
from llama_index.core.node_parser import HierarchicalNodeParser, get_leaf_nodes
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Document
from llama_index.llms.azure_openai import AzureOpenAI

def build_automerging_index(document, LLM, save_dir=None, chunk_sizes=None):
    if save_dir is None:
        save_dir = os.path.join(os.path.expanduser("~"), "merging_index")

    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    chunk_sizes = chunk_sizes or [2048, 512, 128]
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
    nodes = node_parser.get_nodes_from_documents(document)
    leaf_nodes = get_leaf_nodes(nodes)

    Settings.llm = LLM
    Settings.embed_model = Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")

    storage_context = StorageContext.from_defaults()

    docstore_path = os.path.join(save_dir, 'docstore.json')
    if os.path.exists(docstore_path):
        automerging_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=save_dir))
    else:
        storage_context.docstore.add_documents(nodes)
        automerging_index = VectorStoreIndex(leaf_nodes, storage_context=storage_context)
        automerging_index.storage_context.persist(persist_dir=save_dir)

    return automerging_index

def get_automerging_query_engine(automerging_index, similarity_top_k=10):
    base_retriever = automerging_index.as_retriever(similarity_top_k=similarity_top_k)
    retriever = AutoMergingRetriever(base_retriever, automerging_index.storage_context, verbose=True)
    auto_merging_engine = RetrieverQueryEngine.from_args(retriever)
    return auto_merging_engine

content = "example...."
document = Document(id="1", text=content)

# Configuration for Azure OpenAI
azure_openai_api_key = "...."
azure_openai_endpoint = "...."
azure_openai_api_version = "....."  

# Instantiate OpenAI with Azure configuration

llm = AzureOpenAI(
    model="gpt-4o",
    deployment_name="deployment_name",
    api_key=azure_openai_api_key,
    azure_endpoint=azure_openai_endpoint,
    api_version=azure_openai_api_version,
)

index = build_automerging_index(
    [document],
    LLM = llm,
    save_dir="./merging_index"
)

query_engine = get_automerging_query_engine(index, similarity_top_k=10)

query = "What is the example content about?"
response = query_engine.query(query)
print(response)
dosubot[bot] commented 1 month ago

Hey @Sajad7010! I'm a bot here to help you with any bugs, questions, or contributions. Let's tackle this issue together while we wait for human assistance.

To adjust different embedding models to be compatible with the query engine in your auto-merging retriever setup, you need to ensure that the embedding models are integrated into the vector store and retriever components. Here is a step-by-step guide based on your provided code:

  1. Install Required Packages:

    %pip install llama-index-llms-openai
    %pip install llama-index-embeddings-cohere
  2. Setup API Keys:

    import os
    
    os.environ["OPENAI_API_KEY"] = "YOUR OPENAI KEY"
    os.environ["COHERE_API_KEY"] = "YOUR COHEREAI API KEY"
  3. Load Data:

    from llama_index.core import SimpleDirectoryReader
    
    documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
  4. Create Nodes:

    from llama_index.core.node_parser import SentenceSplitter
    
    node_parser = SentenceSplitter(chunk_size=512)
    nodes = node_parser.get_nodes_from_documents(documents)
    
    for idx, node in enumerate(nodes):
        node.id_ = f"node_{idx}"
  5. Define Embedding Models:

    from llama_index.embeddings.cohere import CohereEmbedding
    from llama_index.legacy.embeddings.openai import OpenAIEmbedding, OpenAIEmbeddingModelType, OpenAIEmbeddingMode
    
    def cohere_embedding(model_name: str, input_type: str, embedding_type: str) -> CohereEmbedding:
        return CohereEmbedding(
            api_key=os.environ["COHERE_API_KEY"],
            model_name=model_name,
            input_type=input_type,
            embedding_type=embedding_type,
        )
    
    def openai_embedding(model_name: str, mode: str) -> OpenAIEmbedding:
        return OpenAIEmbedding(
            api_key=os.environ["OPENAI_API_KEY"],
            model=model_name,
            mode=mode,
        )
  6. Create Retrievers for Different Embedding Types:

    from llama_index.core import VectorStoreIndex
    
    def retriever(nodes, embedding_type="float", model_name="embed-english-v3.0", provider="cohere"):
        if provider == "cohere":
            embed_model = cohere_embedding(model_name, "search_document", embedding_type)
        elif provider == "openai":
            embed_model = openai_embedding(model_name, OpenAIEmbeddingMode.TEXT_SEARCH_MODE)
        else:
            raise ValueError("Unsupported provider")
    
        vector_index = VectorStoreIndex(
            nodes,
            embed_model=embed_model,
        )
        retriever = vector_index.as_retriever(
            similarity_top_k=2,
            embed_model=embed_model,
        )
        return retriever
    
    retriever_float = retriever(nodes)
    retriever_int8 = retriever(nodes, "int8")
    retriever_binary = retriever(nodes, "binary")
    retriever_ubinary = retriever(nodes, "ubinary")
    retriever_openai = retriever(nodes, model_name=OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002, provider="openai")
  7. Plug Retrievers into Query Engine:

    from llama_index.core.query_engine import RetrieverQueryEngine
    
    query_engine_float = RetrieverQueryEngine.from_args(retriever_float)
    query_engine_int8 = RetrieverQueryEngine.from_args(retriever_int8)
    query_engine_binary = RetrieverQueryEngine.from_args(retriever_binary)
    query_engine_ubinary = RetrieverQueryEngine.from_args(retriever_ubinary)
    query_engine_openai = RetrieverQueryEngine.from_args(retriever_openai)
  8. Query the Engine:

    query_str = "What did the author do growing up?"
    
    response_float = query_engine_float.query(query_str)
    response_int8 = query_engine_int8.query(query_str)
    response_binary = query_engine_binary.query(query_str)
    response_ubinary = query_engine_ubinary.query(query_str)
    response_openai = query_engine_openai.query(query_str)
    
    print("Float Embedding Response:", str(response_float))
    print("Int8 Embedding Response:", str(response_int8))
    print("Binary Embedding Response:", str(response_binary))
    print("UBinary Embedding Response:", str(response_ubinary))
    print("OpenAI Embedding Response:", str(response_openai))

Additionally, you can use the AdapterEmbeddingModel to wrap any embedding model and add an adapter layer on top of it. This is useful for fine-tuning an embedding model on a downstream task. Here is an example of how to use the AdapterEmbeddingModel:

from llama_index.legacy.embeddings.adapter import AdapterEmbeddingModel

def adapter_embedding(base_embed_model, adapter_path, transform_query=True, device=None):
    return AdapterEmbeddingModel(
        base_embed_model=base_embed_model,
        adapter_path=adapter_path,
        transform_query=transform_query,
        device=device
    )

By following these steps, you can adjust different embedding models to be compatible with the query engine in your auto-merging retriever setup. The key is to ensure that the embedding models are properly integrated into the vector store and retriever components. Additionally, you can evaluate the performance of different embedding types using metrics like Hit-Rate and MRR (Mean-Reciprocal-Rank) to choose the best model for your needs [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

@Sajad7010 it sounds like you built your index with one embedding model, and are querying with another.

If you want to change embedding models, you need to rebuild the index. It's very important that the same embedding model is used for construction and querying

Sajad7010 commented 1 month ago

@Sajad7010 it sounds like you built your index with one embedding model, and are querying with another.

If you want to change embedding models, you need to rebuild the index. It's very important that the same embedding model is used for construction and querying

@logan-markewich I didn't understand your point. I call build_automerging_index function for embedding and that's it. I feel the problem would be some embedding models generate large vectors that the dimension does not fit the query, but I do not know how I should adjust that!