langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.57k stars 15.3k forks source link

RetrievalQAWithSourcesChain provides unreliable sources #5642

Closed startakovsky closed 1 year ago

startakovsky commented 1 year ago

System Info

System Info

Who can help?

@hwchase17

Summary

The sources component of the output of RetrievalQAWithSourcesChain is not providing transparency into what documents the retriever returns, it is instead some output that the llm contrives.

Motivation

From my perspective, the primary advantage of having visibility into sources is to allow the system to provide transparency into the documents that were retrieved in assisting the language model to generate its answer. Only after being confused for quite a while and inspecting the code did I realize that the sources were just being conjured up.

Advice

I think it is important to ensure that people know about this, as maybe this isn't a bug and is more documentation-related, though I think the docstring should be updated as well.

Notes

Document Retrieval Works very well.

It's worth noting that in this toy example, the combination of FAISS vector store and the OpenAIEmbeddings embeddings model are doing very reasonably, and are deterministic.

Recommendation

Add caveats everywhere. Frankly, I would never trust using this chain. I literally had an example the other day where it wrongly made up a source and a wikipedia url that had absolutely nothing to do with the documents retrieved. I could supply this example as it is a way better illustration of how this chain will hallucinate sources because they are generated by the LLM, but it's just a little bit more involved than this smaller example.

Information

Related Components

Reproduction

Demonstrative Example

Here's the simplest example I could come up with:

1. Instantiate a vectorstore with 7 documents displayed below.

>>> from langchain.vectorstores import FAISS
>>> from langchain.embeddings import OpenAIEmbeddings
>>> from langchain.llms import OpenAI
>>> from langchain.chains import RetrievalQAWithSourcesChain

>>> chars = ['a', 'b', 'c', 'd', '1', '2', '3']
>>> texts = [4*c for c in chars]
>>> metadatas = [{'title': c, 'source': f'source_{c}'} for c in chars]

>>> vs = FAISS.from_texts(texts, embedding=OpenAIEmbeddings(), metadatas=metadatas)
>>> retriever = vs.as_retriever(search_kwargs=dict(k=5))
>>> vs.docstore._dict
{'0ec43ce4-6753-4dac-b72a-6cf9decb290e': Document(page_content='aaaa', metadata={'title': 'a', 'source': 'source_a'}),
 '54baed0b-690a-4ffc-bb1e-707eed7da5a1': Document(page_content='bbbb', metadata={'title': 'b', 'source': 'source_b'}),
 '85b834fa-14e1-4b20-9912-fa63fb7f0e50': Document(page_content='cccc', metadata={'title': 'c', 'source': 'source_c'}),
 '06c0cfd0-21a2-4e0c-9c2e-dd624b5164fe': Document(page_content='dddd', metadata={'title': 'd', 'source': 'source_d'}),
 '94d6444f-96cd-4d88-8973-c3c0b9bf0c78': Document(page_content='1111', metadata={'title': '1', 'source': 'source_1'}),
 'ec04b042-a4eb-4570-9ee9-a2a0bd66a82e': Document(page_content='2222', metadata={'title': '2', 'source': 'source_2'}),
 '0031d3fc-f291-481e-a12a-9cc6ed9761e0': Document(page_content='3333', metadata={'title': '3', 'source': 'source_3'})}

2. Instantiate a RetrievalQAWithSourcesChain

The return_source_documents is set to True so that we can inspect the actual sources retrieved.

>>> qa_sources = RetrievalQAWithSourcesChain.from_chain_type(
    OpenAI(), 
    retriever=retriever, 
    return_source_documents=True
)

3. Example Question

Things look sort of fine, meaning 5 documents are retrieved by the retriever, but the model only lists only a single source.

qa_sources('what is the first lower-case letter of the alphabet?')
{'question': 'what is the first lower-case letter of the alphabet?',
 'answer': ' The first lower-case letter of the alphabet is "a".\n',
 'sources': 'source_a',
 'source_documents': [Document(page_content='bbbb', metadata={'title': 'b', 'source': 'source_b'}),
  Document(page_content='aaaa', metadata={'title': 'a', 'source': 'source_a'}),
  Document(page_content='cccc', metadata={'title': 'c', 'source': 'source_c'}),
  Document(page_content='dddd', metadata={'title': 'd', 'source': 'source_d'}),
  Document(page_content='1111', metadata={'title': '1', 'source': 'source_1'})]}

4. Second Example Question containing the First Question.

This is not what I would expect, considering that this question contains the previous question, and that the vector store did supply the document with {'source': 'source_a'}, but for some reason (i.e. the internals of the output of OpenAI() ) in this response from the chain, there are zero sources listed.

>>> qa_sources('what is the one and only first lower-case letter and number of the alphabet and whole number system?')
{'question': 'what is the one and only first lower-case letter and number of the alphabet and whole number system?',
 'answer': ' The one and only first lower-case letter and number of the alphabet and whole number system is "a1".\n',
 'sources': 'N/A',
 'source_documents': [Document(page_content='1111', metadata={'title': '1', 'source': 'source_1'}),
  Document(page_content='bbbb', metadata={'title': 'b', 'source': 'source_b'}),
  Document(page_content='aaaa', metadata={'title': 'a', 'source': 'source_a'}),
  Document(page_content='2222', metadata={'title': '2', 'source': 'source_2'}),
  Document(page_content='cccc', metadata={'title': 'c', 'source': 'source_c'})]}

Expected behavior

I am not sure. We need a warning, perhaps, every time this chain is used, or some strongly worded documentation for our developers.

dosubot[bot] commented 1 year ago

Hi, @startakovsky! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is regarding the RetrievalQAWithSourcesChain component in the langchain library. The issue was about the component not accurately providing sources for retrieved documents, which caused confusion about which documents were used to generate answers. The author recommended updating the documentation and adding warnings to address this issue.

It seems that there hasn't been any further activity or updates on this issue. Therefore, I wanted to check with you if this issue is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project. If you have any further questions or concerns, please let me know.