Open Matthieu-Tinycoaching opened 1 year ago
Is there any progress?
Please update if there is any progress
Hi here, The reply might be a bit late now, but assuming you have successfully executed the OpenAI API correctly with the examples here, you can use the following example. Here is a simple implementation using LangChain. Although RetrievalQA will soon be deprecated in higher versions of LangChain, this demonstrates that you can implement it in this direction.
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain_openai import ChatOpenAI
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=5)
embedding = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2', model_kwargs={'device': 'cpu'})
loader = PyMuPDFLoader("Q1VEQS1YIExpYnJhcmllcyAzLzIyLzIyLnBkZg==.pdf")
PDF_data = loader.load()
all_splits = text_splitter.split_documents(PDF_data)
persist_directory = 'db'
vectordb = Chroma.from_documents(documents=all_splits, embedding=embedding, persist_directory=persist_directory)
retriever = vectordb.as_retriever()
llm = ChatOpenAI(
openai_api_key='EMPTY',
base_url='http://localhost:8000/v1/',
model='vicuna-13b-v1.5-16k',
max_tokens=100
)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=False
)
question = 'How many cores does the Grace superchip contain?'
respond = llm.invoke(question)
respond_qa = qa.invoke(question)
print(f"LLM Respond:\n{respond.content}\n")
print(f"LLM QA Respond:\n{respond_qa['result']}")
Results:
LLM Respond:
I'm sorry, but I'm not familiar with a technology or device called "Grace superchip." Can you please provide more context or information about what you are referring to?
LLM QA Respond:
The Grace superchip contains 136 cores.
Hi,
Willing to use the
Vicuna v1.5 7b
model for RAG or Retrieval Augmented Generation (Q&A based on retrieved documents or context), I tried many prompts based on Llama 2 prompting but never managed to obtain a good answer following these criteria:1) Good formatting output (no prompt or context (retrieved documents) printing) 2) The model has to answer 'I don't know' if the answer cannot be determined from the provided context
Does anyone has an advice for using
Vicuna v1.5 7b
for RAG?