lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
35.57k stars 4.37k forks source link

Correct prompt for Vicuna v1.5 7b in the case of RAG #2303

Open Matthieu-Tinycoaching opened 10 months ago

Matthieu-Tinycoaching commented 10 months ago

Hi,

Willing to use the Vicuna v1.5 7b model for RAG or Retrieval Augmented Generation (Q&A based on retrieved documents or context), I tried many prompts based on Llama 2 prompting but never managed to obtain a good answer following these criteria:

1) Good formatting output (no prompt or context (retrieved documents) printing) 2) The model has to answer 'I don't know' if the answer cannot be determined from the provided context

Does anyone has an advice for using Vicuna v1.5 7b for RAG?

Dandelionym commented 6 months ago

Is there any progress?

Midhun-2001 commented 1 month ago

Please update if there is any progress

Minxiangliu commented 3 weeks ago

Hi here, The reply might be a bit late now, but assuming you have successfully executed the OpenAI API correctly with the examples here, you can use the following example. Here is a simple implementation using LangChain. Although RetrievalQA will soon be deprecated in higher versions of LangChain, this demonstrates that you can implement it in this direction.

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain_openai import ChatOpenAI

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=5)
embedding = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2', model_kwargs={'device': 'cpu'})

loader = PyMuPDFLoader("Q1VEQS1YIExpYnJhcmllcyAzLzIyLzIyLnBkZg==.pdf")
PDF_data = loader.load()
all_splits = text_splitter.split_documents(PDF_data)

persist_directory = 'db'
vectordb = Chroma.from_documents(documents=all_splits, embedding=embedding, persist_directory=persist_directory)
retriever = vectordb.as_retriever()

llm = ChatOpenAI(
    openai_api_key='EMPTY', 
    base_url='http://localhost:8000/v1/', 
    model='vicuna-13b-v1.5-16k',
    max_tokens=100
)

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=False
)
question = 'How many cores does the Grace superchip contain?'
respond = llm.invoke(question)
respond_qa = qa.invoke(question)

print(f"LLM Respond:\n{respond.content}\n")
print(f"LLM QA Respond:\n{respond_qa['result']}")

Results:

LLM Respond:
I'm sorry, but I'm not familiar with a technology or device called "Grace superchip." Can you please provide more context or information about what you are referring to?

LLM QA Respond:
The Grace superchip contains 136 cores.