[Question]: Inaccurate Responses in RAG System Using LlamaIndex and HuggingFaceLLM

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

MIT License

36.48k stars 5.21k forks source link

from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext,PromptTemplate from llama_index.llms.huggingface import HuggingFaceLLM from llama_index.core.prompts.prompts import SimpleInputPrompt documents = SimpleDirectoryReader("./data").load_data() system_prompt = """<|SYSTEM|># Yapay zeka asistanısın. Verilen dataya göre soruları tutarlı ve doğru cevapla. """ # This will wrap the default prompts that are internal to llama-index query_wrapper_prompt = PromptTemplate("<|USER|>Veriye dayalı olarak şu soruyu cevapla: {query_str}<|ASSISTANT|>") import torch from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig model_id = "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4" llm = HuggingFaceLLM( context_window=4096, #4096 max_new_tokens=512, generate_kwargs={"temperature": 0.1, "do_sample": False}, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, tokenizer_name=model_id, model_name=model_id, device_map="cuda:0", #auto tokenizer_kwargs={"max_length": 4096} # 4096 ) from langchain.embeddings import HuggingFaceEmbeddings from llama_index.legacy.embeddings.langchain import LangchainEmbedding from llama_index.core.node_parser import SentenceSplitter from llama_index.core import Settings import chromadb from llama_index.vector_stores.chroma import ChromaVectorStore from llama_index.core import StorageContext # embeddings = LangchainEmbedding(HuggingFaceEmbeddings(model_name="dunzhang/stella_en_1.5B_v5")) embeddings = LangchainEmbedding(HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")) db = chromadb.PersistentClient(path="./chroma_db") chroma_collection = db.get_or_create_collection("quickstart") vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) Settings.llm = llm Settings.embed_model = embeddings Settings.node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=100) Settings.num_output = 512 Settings.context_window = 3900 index = VectorStoreIndex.from_documents( documents, embed_model=embeddings, storage_context=storage_context ) query_engine = index.as_query_engine(streaming=True, llm=llm) response_stream = query_engine.query("question...") response_stream.print_response_stream()

I found a similar open issue discussing inaccurate responses in a RAG system using LlamaIndex and open-source models ^[1]. The suggested solution involves several steps to diagnose and improve the accuracy:

Generator Component: Ensure that the generator (LLM) is effectively utilizing the retrieved context. Adjusting the generation prompts and evaluation prompts can help improve the accuracy.
Embedding Models: Evaluate the quality of the embeddings used in your RAG pipeline. Use metrics like Hit-Rate and Mean-Reciprocal-Rank (MRR) to determine the most effective embedding type for your dataset. Fine-tuning the embedding models can also lead to significant improvements.
Vector Store Configuration: Optimize the parameters of your vector databases (Qdrant, Chroma, Elasticsearch) for better retrieval performance. For example, you can adjust the similarity_top_k parameter to retrieve more context or enable hybrid search features.
Evaluation Tools: Use tools like UpTrain to perform evaluations on the generated responses. UpTrain provides grades for various preconfigured checks, performs root cause analysis on failure cases, and gives insights on how to resolve them.
Advanced Retrieval Techniques: Consider using advanced retrieval techniques such as Activeloop's Deep Memory, which introduces a tiny neural network layer trained to match user queries with relevant data from a corpus, boosting retrieval accuracy.

Regarding the data format in the .txt file, the LabelledRagDataset class expects the data to be structured in a specific way, with fields such as query, reference_contexts, reference_answer, reference_answer_by, and query_by. If the .txt file does not adhere to this expected format, the system may not be able to correctly parse and utilize the data, leading to incorrect results ^[2].

Additionally, ensure that the SimpleDirectoryReader is correctly reading the text files. The SimpleDirectoryReader class treats text files as plain text by default and supports various file types based on their extensions ^[3].

Lastly, the HuggingFaceLLM class handles the system_prompt and query_wrapper_prompt by incorporating them into the full prompt that is sent to the model. Ensure that these prompts are correctly formatted and used during the query generation process ^[4].

By addressing these aspects, you can improve the accuracy and relevance of the responses generated by your RAG system.

To continue talking to Dosu, mention @dosu.

run-llama / llama_index

[Question]: Inaccurate Responses in RAG System Using LlamaIndex and HuggingFaceLLM #15801

Question Validation

Question