Closed portswigger-katie closed 4 months ago
🤖
Hello,
Thank you for bringing this to our attention. Based on the context provided, it seems like the RetrievalQAWithSourcesChain
is designed to separate the answer from the sources. This is done by the _split_sources(text)
method, which takes a text as input and returns two outputs: the answer and the sources. The sources are not included in the answer.
Here is the relevant code from the unit test test_spliting_answer_into_answer_and_sources
:
def test_spliting_answer_into_answer_and_sources(
text: str, answer: str, sources: str
) -> None:
qa_chain = QAWithSourcesChain.from_llm(FakeLLM())
generated_answer, generated_sources = qa_chain._split_sources(text)
assert generated_answer == answer
assert generated_sources == sources
In this code, qa_chain._split_sources(text)
is called with a text containing an answer and sources. The method returns generated_answer
and generated_sources
, which are then compared with the expected answer
and sources
respectively. The test passes if the generated answer and sources match the expected answer and sources, indicating that the method correctly separates the answer from the sources.
You can find this test in the libs/langchain/tests/unit_tests/chains/test_qa_with_sources.py file.
However, I wasn't able to find specific changes made to the RetrievalQAWithSourcesChain
in the LangChain updates after version 0.0.238 that could have affected the return of sources.
As a workaround, you might want to try setting the return_source_documents
parameter to True
when creating the ConversationalRetrievalChain
as suggested in a similar issue:
chain = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(),
question_generator=question_generator,
combine_docs_chain=doc_chain,
return_source_documents=True
)
Then, you can access the source documents from the result:
print(result["source_documents"][0])
I hope this helps! If you have any other questions or need further clarification, please let me know.
RetrievalQAWithSourcesChain
not returning sources in sources
field. This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
0.0.297
Thank you to all the langchain developers
Wow, the same problem as me is on the issue.
I use return_source_documents=True
, metadata will be returned, so it seems like a good idea to extract this using a comprehension for now.
chain = RetrievalQAWithSourcesChain.from_chain_type(llm=llm,
chain_type="stuff",
reduce_k_below_max_tokens=True,
+ return_source_documents=True,
retriever=docsearch.as_retriever(),
chain_type_kwargs={"prompt": self.prompt_template})
{'question': '晩婚化について教えて', 'answer': '晩婚化については...います。\n\n(参考資料: 令和4年版少子化社会対策白書全体版(PDF版).pdf 1ページ、4ページ)', 'sources': '', 'source_documents': [Document(page_content='晩婚化....晩', metadata={'source': '令和4年版少子化社会対策白書全体版(PDF版).pdf 4ページ'}), Document(page_content='1\u300....:22:22', metadata={'source': '令和4年版少子化社会対策白書全体版(PDF版).pdf 3ページ'}), Document(page_content='年齢(5...対策白書2', metadata={'source': '令和4年版少子化社会対策白書全体版(PDF版).pdf 5ページ'}), Document(page_content='未婚....22:22', metadata={'source': '令和4年版少子化社会対策白書全体版(PDF版).pdf 2ページ'})]}
I got the same issue and I'm using the
0.0.325
LangChain version
If you need to you can fix the problem in your local code base by replacing this line. That's what I've been doing while waiting for #12556 to be merged into a release.
libs/langchain/langchain/chains/qa_with_sources/base.py
you need to make sure 'source' appears in the metadata of your vector store.
Hi, @portswigger-katie,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue "RetrievalQAWithSourcesChain not returning sources as expected" was observed in langchain version 0.0.287, where the sources were missing from the output. There were detailed explanations provided by me and suggestions for a workaround using the return_source_documents
parameter. Additionally, duri0214 encountered the same problem and shared their approach of using return_source_documents=True
. It seems that the issue has been resolved by using the return_source_documents
parameter to retrieve the expected sources in the output, and a local fix was used while waiting for a specific pull request to be merged. SuperHao-Wu also advised ensuring that 'source' appears in the metadata of the vector store.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!
System Info
I have a question&answer over docs chatbot application, that uses the RetrievalQAWithSourcesChain and ChatPromptTemplate. In langchain version 0.0.238 it used to return sources but this seems to be broken in the releases since then. Python version: Python 3.11.4 LangChain version: 0.0.287 Example response with missing sources:
Who can help?
No response
Information
Related Components
Reproduction
Expected behavior
expected output: