[Question]: In a RAG pipeline, made using llamaindex, where is "llm" used while where is "embed model" used ?

GildeshAbhay commented 3 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

    content = document["content"]
    doc = Document(text=content)
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_size)
    nodes = node_parser.get_nodes_from_documents([doc])
    service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model,node_parser=node_parser)
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)
    index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
    index.storage_context.persist(persist_dir=pathname)
    postproc = None
    reranker = SentenceTransformerRerank(top_n = rr_metadata)
    retriever = index.as_retriever(similarity_top_k=retrieval_metadata_similarity)
    retriever = AutoMergingRetriever(retriever,index.storage_context,verbose=True)
    response_mode = 'tree_summarize'
    response_synthesizer = get_response_synthesizer(response_mode=response_mode)
    node_postprocessors = [postproc, reranker]
    node_postprocessors = [processor for processor in node_postprocessors if processor is not None]
    query_engine = RetrieverQueryEngine(retriever, node_postprocessors=node_postprocessors)
    summary_whole = query_engine.query(rag_prompt_original)
    summary_text = str(summary_whole)  # Convert the summary object to a string
    wrapped_summary = textwrap.fill(summary_text, width=80)  # Wrap the summary text

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2) Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small",)

If i give these two lines at the top of the code, will it ensure that embeddings are created using text embedding 3 small while the query engine uses llm ? Any other things to take care at this stage, like in service context, emebdding model is also used separately?

dosubot[bot] commented 3 months ago

Hey @GildeshAbhay! Great to see you diving into another interesting challenge with LlamaIndex. How's everything going on your end?

Yes, your understanding is correct. By setting Settings.llm to "gpt-3.5-turbo" and Settings.embed_model to "text-embedding-3-small" at the top of your RAG pipeline code, you are ensuring that embeddings are created using "text-embedding-3-small" while the query engine uses "gpt-3.5-turbo".

The LlamaIndex handles these settings through properties with lazy initialization and setter methods. This means that the LLM (Language Learning Model) and the embedding model are only resolved when they are first accessed, which provides flexibility and efficiency in managing the configurations of the LlamaIndex.

When you set Settings.llm and Settings.embed_model, you are explicitly setting these properties, which triggers the setter methods. These methods resolve the provided LLM type and embedding model type using the resolve_llm and resolve_embed_model functions respectively, and assign them to _llm and _embed_model.

As for additional considerations, you should ensure that the embedding model is compatible with the inputs and outputs expected by the rest of your pipeline. Also, any necessary preprocessing or postprocessing steps should be taken into account. However, without more specific details about the service context or the rest of the pipeline, it's difficult to provide a comprehensive list of additional considerations.

You can find more details about how LlamaIndex handles these configurations in the llama-index-core/llama_index/core/settings.py file.

Sources

llama-index-core/llama_index/core/settings.py

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 3 months ago

Service context is deprecated. Settings controls the global defaults for things

You can also pass them into interfaces as needed instead of globals. For example, from your code

index = VectorStoreIndex(nodes=nodes, storage_context=storage_context, embed_model=embed_model)

# you didn't actually use this anywhere, but this is what uses the LLM in a query engine
response_synthesizer = get_response_synthesizer(response_mode=response_mode, llm=llm)

query_engine = RetrieverQueryEngine(retriever, node_postprocessors=node_postprocessors, response_synthesizer=response_synthesizer)

GildeshAbhay commented 3 months ago

can i also pass llm to query_engine?

query_engine = RetrieverQueryEngine(retriever, node_postprocessors=node_postprocessors, response_synthesizer=response_synthesizer, llm=llm)

would it be redundant since llm is already passed in response synthesizer?

logan-markewich commented 3 months ago

@GildeshAbhay exactly, the response synthesizer already has the LLM, so in the above the arg is unused

run-llama / llama_index