Closed mattliscia closed 1 year ago
I also experience the same. I am using retrievalQAwithsourceschain and michael jackson and the president also popped into the response. See screenshot.
still, see this Michael Jackson hallucination using
chain = RetrievalQAWithSourcesChain.from_chain_type(
llm= HuggingFaceHub(
repo_id="tiiuae/falcon-7b-instruct",
model_kwargs={"max_new_tokens": 500}
),
chain_type="map_reduce",
retriever=FAISS.from_documents(doc_splitter.split_documents(data),
HuggingFaceEmbeddings()).as_retriever()
)
I believe this occurs as part of the map_reduce.py
file:
result, extra_return_dict = self.reduce_documents_chain.combine_docs(
result_docs, callbacks=callbacks, **kwargs
)
The solution to avoid Michael Jackson hallucinations is to override the default prompt template used by the model. You can do this by providing your own prompt template in the .from_chain_type function call.
Here's an example of how you can define your own prompt template:
template = """Given the following sections from various documents and a question,
generate a final answer with references ("SOURCES"). If the answer is unknown,
indicate as such without attempting to fabricate a response. Ensure to always
include a "SOURCES" section in your answer.
QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""
my_prompt = PromptTemplate(
template=template,
input_variables=["summaries", "question"],
)
Then you can use this custom prompt when creating your QA chain like so:
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=chatgpt,
chain_type="stuff",
retriever=retriever,
chain_type_kwargs={
"prompt": my_prompt
},
reduce_k_below_max_tokens=True,
return_source_documents=True)
For more information on this approach, you may refer to this discussion: https://github.com/hwchase17/langchain/discussions/3115#discussioncomment-5666273
Hi, @mattliscia! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding of the issue, you reported encountering unexpected responses when using the load_qa_with_sources_chain
function. Other users, onglette and Hadar933, have also experienced similar issues with mentions of Michael Jackson in the responses. Hadar933 suspects that the issue might be related to the map_reduce.py
file. ykim-isabel suggests overriding the default prompt template used by the model as a solution to avoid these unexpected responses and provides an example of how to define a custom prompt template.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository!
After further examination, it appears that larger models do not show the same problem. This is probably the result of small models just repeating some of the context they received
Thank you for your response, @Hadar933! We appreciate your further examination of the issue. Based on your findings, it seems that the problem might be specific to smaller models repeating context. We will proceed with closing the issue.
Following the tutorial for load_qa_with_sources_chain using the example state_of_the_union.txt I encounter interesting situations. Sometimes when I ask a query such as "What did Biden say about Ukraine?" I get a response like this: "Joe Biden talked about the Ukrainian people's fearlessness, courage, and determination in the face of Russian aggression. He also announced that the United States will provide military, economic, and humanitarian assistance to Ukraine, including more than $1 billion in direct assistance. He further emphasized that the United States and its allies will defend every inch of territory of NATO countries, including Ukraine, with the full force of their collective power. However, he mentioned nothing about Michael Jackson."
I know that there are examples directly asking about Michael Jackson in the documentation: https://python.langchain.com/en/latest/use_cases/evaluation/data_augmented_question_answering.html?highlight=michael%20jackson#examples
Here is my code for reproducing situation:
Output:
Is it possible there is a remnant of that example code that gets called and adds the question about Michael Jackson?