run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.48k stars 5.21k forks source link

[Question]: Inaccurate Responses in RAG System Using LlamaIndex and HuggingFaceLLM #15801

Open Rumeysakeskin opened 2 months ago

Rumeysakeskin commented 2 months ago

Question Validation

Question

Below is my code:

from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext,PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.prompts.prompts import SimpleInputPrompt

documents = SimpleDirectoryReader("./data").load_data()

system_prompt = """<|SYSTEM|>#
Yapay zeka asistanısın. Verilen dataya göre soruları tutarlı ve doğru cevapla.
"""

# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = PromptTemplate("<|USER|>Veriye dayalı olarak şu soruyu cevapla: {query_str}<|ASSISTANT|>")

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig

model_id = "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4"

llm = HuggingFaceLLM(
    context_window=4096, #4096
    max_new_tokens=512,
    generate_kwargs={"temperature": 0.1, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=model_id,
    model_name=model_id,
    device_map="cuda:0", #auto
    tokenizer_kwargs={"max_length": 4096} # 4096
)

from langchain.embeddings import HuggingFaceEmbeddings
from llama_index.legacy.embeddings.langchain import LangchainEmbedding 
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# embeddings = LangchainEmbedding(HuggingFaceEmbeddings(model_name="dunzhang/stella_en_1.5B_v5"))
embeddings = LangchainEmbedding(HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"))

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

Settings.llm = llm
Settings.embed_model = embeddings
Settings.node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=100)
Settings.num_output = 512
Settings.context_window = 3900

index = VectorStoreIndex.from_documents(
    documents, embed_model=embeddings, storage_context=storage_context
)

query_engine = index.as_query_engine(streaming=True, llm=llm)

response_stream = query_engine.query("question...")
response_stream.print_response_stream()

Can you help me understand why the results are not correct?

Could it be related to the data format? It is in the form of text in a .txt file.

dosubot[bot] commented 2 months ago

I found a similar open issue discussing inaccurate responses in a RAG system using LlamaIndex and open-source models [1]. The suggested solution involves several steps to diagnose and improve the accuracy:

  1. Generator Component: Ensure that the generator (LLM) is effectively utilizing the retrieved context. Adjusting the generation prompts and evaluation prompts can help improve the accuracy.

  2. Embedding Models: Evaluate the quality of the embeddings used in your RAG pipeline. Use metrics like Hit-Rate and Mean-Reciprocal-Rank (MRR) to determine the most effective embedding type for your dataset. Fine-tuning the embedding models can also lead to significant improvements.

  3. Vector Store Configuration: Optimize the parameters of your vector databases (Qdrant, Chroma, Elasticsearch) for better retrieval performance. For example, you can adjust the similarity_top_k parameter to retrieve more context or enable hybrid search features.

  4. Evaluation Tools: Use tools like UpTrain to perform evaluations on the generated responses. UpTrain provides grades for various preconfigured checks, performs root cause analysis on failure cases, and gives insights on how to resolve them.

  5. Advanced Retrieval Techniques: Consider using advanced retrieval techniques such as Activeloop's Deep Memory, which introduces a tiny neural network layer trained to match user queries with relevant data from a corpus, boosting retrieval accuracy.

Regarding the data format in the .txt file, the LabelledRagDataset class expects the data to be structured in a specific way, with fields such as query, reference_contexts, reference_answer, reference_answer_by, and query_by. If the .txt file does not adhere to this expected format, the system may not be able to correctly parse and utilize the data, leading to incorrect results [2].

Additionally, ensure that the SimpleDirectoryReader is correctly reading the text files. The SimpleDirectoryReader class treats text files as plain text by default and supports various file types based on their extensions [3].

Lastly, the HuggingFaceLLM class handles the system_prompt and query_wrapper_prompt by incorporating them into the full prompt that is sent to the model. Ensure that these prompts are correctly formatted and used during the query generation process [4].

By addressing these aspects, you can improve the accuracy and relevance of the responses generated by your RAG system.

To continue talking to Dosu, mention @dosu.