langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.2k stars 15.23k forks source link

Improvement: MapReduce summarization chains executes a map step on a single document #1937

Closed TomTom101 closed 1 year ago

TomTom101 commented 1 year ago

When feeding the map_reduce summarization chain with a single document, the doc is run through an unnecessary map step before running a combine prompt on it. The combine prompt would imo be sufficient to avoid summarizing a single summary which then is very lossy. Great project, hope this thrives!

ShantanuNair commented 1 year ago

@TomTom101 when you say single document, do you mean that the entire document fits in the context? Because it could be a long document and need that map step, even though it's a single document. Correct me if I misinterpreted what you're saying.

TomTom101 commented 1 year ago

The document is split into chunks with len(chunks)==1 (so basically not split) where the single chunk easily fits into the context window. There's a map step that summarizes and a combine step that summarizes this single summary. Hope this explains this better! Cannot rule out that I'm doing sth. wrong, though.

TomTom101 commented 1 year ago

Closed as I can no more observe that behavior. There was no chain in LC, so must have been my fault.

TomTom101 commented 1 year ago

I already did patch langchain, that's why this problem was gone ;) Will provide a PR soon

dosubot[bot] commented 1 year ago

Hi, @TomTom101! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened this issue suggesting improvements to the MapReduce summarization chains by avoiding an unnecessary map step when processing a single document. There was some discussion between you and ShantanuNair about whether the entire document fits in the context, and you clarified that the document is split into chunks and the single chunk easily fits into the context window.

You later closed the issue, stating that the problem was resolved and mentioned that you will provide a pull request soon.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution!

TomTom101 commented 1 year ago

This behavior is actually totally legit, map reduce is just not a good choice if one has a single document.

dosubot[bot] commented 1 year ago

Thank you so much, @TomTom101, for closing this issue in LangChain! Your contribution is greatly appreciated.