langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.46k stars 14.79k forks source link

`RetrievalQAWithSourcesChain` not returning sources in `sources` field. #5536

Closed eRuaro closed 1 year ago

eRuaro commented 1 year ago

System Info

System Info (Docker Dev Container):

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Python: 3.10

Pip:

absl-py                  1.4.0
aiohttp                  3.8.4
aiosignal                1.3.1
antlr4-python3-runtime   4.9.3
anyio                    3.6.2
argilla                  1.6.0
async-timeout            4.0.2
attrs                    23.1.0
backoff                  2.2.1
cachetools               5.3.0
certifi                  2022.12.7
cffi                     1.15.1
charset-normalizer       3.1.0
click                    8.1.3
cloudpickle              2.2.1
cmake                    3.26.3
coloredlogs              15.0.1
commonmark               0.9.1
contourpy                1.0.7
cryptography             40.0.2
cycler                   0.11.0
dataclasses-json         0.5.7
Deprecated               1.2.13
detectron2               0.4
effdet                   0.3.0
et-xmlfile               1.1.0
exceptiongroup           1.1.1
fastapi                  0.95.1
filelock                 3.11.0
flatbuffers              23.3.3
fonttools                4.39.3
frozenlist               1.3.3
future                   0.18.3
fvcore                   0.1.3.post20210317
google-auth              2.17.3
google-auth-oauthlib     1.0.0
gptcache                 0.1.11
greenlet                 2.0.2
grpcio                   1.53.0
h11                      0.14.0
httpcore                 0.16.3
httpx                    0.23.3
huggingface-hub          0.13.4
humanfriendly            10.0
idna                     3.4
iniconfig                2.0.0
iopath                   0.1.10
Jinja2                   3.1.2
joblib                   1.2.0
kiwisolver               1.4.4
langchain                0.0.141
layoutparser             0.3.4
lit                      16.0.1
lxml                     4.9.2
Markdown                 3.4.3
MarkupSafe               2.1.2
marshmallow              3.19.0
marshmallow-enum         1.5.1
matplotlib               3.7.1
monotonic                1.6
mpmath                   1.3.0
msg-parser               1.2.0
multidict                6.0.4
mypy-extensions          1.0.0
networkx                 3.1
nltk                     3.8.1
numpy                    1.23.5
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
oauthlib                 3.2.2
olefile                  0.46
omegaconf                2.3.0
onnxruntime              1.14.1
openai                   0.27.4
openapi-schema-pydantic  1.2.4
opencv-python            4.6.0.66
openpyxl                 3.1.2
packaging                23.1
pandas                   1.5.3
pdf2image                1.16.3
pdfminer.six             20221105
pdfplumber               0.9.0
pgvector                 0.1.6
Pillow                   9.5.0
pip                      23.1
pluggy                   1.0.0
portalocker              2.7.0
protobuf                 4.22.3
psycopg2-binary          2.9.6
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycocotools              2.0.6
pycparser                2.21
pydantic                 1.10.7
pydot                    1.4.2
Pygments                 2.15.0
pypandoc                 1.11
pyparsing                3.0.9
pypdf                    3.9.0
pytesseract              0.3.10
pytest                   7.3.1
python-dateutil          2.8.2
python-docx              0.8.11
python-dotenv            1.0.0
python-magic             0.4.27
python-multipart         0.0.6
python-poppler           0.4.0
python-pptx              0.6.21
pytz                     2023.3
PyYAML                   6.0
regex                    2023.3.23
requests                 2.28.2
requests-oauthlib        1.3.1
rfc3986                  1.5.0
rich                     13.0.1
rsa                      4.9
scipy                    1.10.1
setuptools               65.5.1
six                      1.16.0
sniffio                  1.3.0
SQLAlchemy               1.4.47
starlette                0.26.1
sympy                    1.11.1
tabulate                 0.9.0
tenacity                 8.2.2
tensorboard              2.12.2
tensorboard-data-server  0.7.0
tensorboard-plugin-wit   1.8.1
termcolor                2.2.0
tiktoken                 0.3.3
timm                     0.6.13
tokenizers               0.13.3
tomli                    2.0.1
torch                    2.0.0
torchaudio               2.0.1
torchvision              0.15.1
tqdm                     4.65.0
transformers             4.28.1
triton                   2.0.0
typing_extensions        4.5.0
typing-inspect           0.8.0
unstructured             0.5.12
unstructured-inference   0.3.2
urllib3                  1.26.15
uvicorn                  0.21.1
Wand                     0.6.11
Werkzeug                 2.2.3
wheel                    0.40.0
wrapt                    1.14.1
XlsxWriter               3.1.0
yacs                     0.1.8
yarl                     1.8.2

Who can help?

@hwchase17

Information

Related Components

Reproduction

Write the code below:

      chain = RetrievalQAWithSourcesChain.from_chain_type(
            llm=ChatOpenAI(openai_api_key=api_key),
            chain_type="map_reduce",
            retriever=retriever,
        )

        llm_call = "random llm call"

        result = chain({
            "question": llm_call,
        },
            return_only_outputs=True
        )

Expected behavior

I'm expecting that I'll be having a result["answer"] and non empty result["sources"] but here's what I get instead: image

As you can see, sources is empty but it's included in result["answer"] as a string.

eRuaro commented 1 year ago

Found that restarting the dev container fixed it.

eRuaro commented 1 year ago

Okay, I've found that some llm_call will trigger sources being blank and have them in the answer field. For me, llm_call with the words ellipsis, punctuation marks, single quotes, and double quotes trigger it.

dosubot[bot] commented 1 year ago

Hi, @eRuaro! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you reported is related to the RetrievalQAWithSourcesChain not returning any sources in the sources field when using the map_reduce chain type. You mentioned that restarting the dev container fixed the issue, but you also discovered that certain llm_call triggers the sources field being blank and have them in the answer field. It seems that you are awaiting assistance from @hwchase17.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!

levalencia commented 8 months ago

I am having the exact same issue!