Closed TomTom101 closed 1 year ago
@TomTom101 when you say single document, do you mean that the entire document fits in the context? Because it could be a long document and need that map step, even though it's a single document. Correct me if I misinterpreted what you're saying.
The document is split into chunks with len(chunks)==1
(so basically not split) where the single chunk easily fits into the context window. There's a map step that summarizes and a combine step that summarizes this single summary. Hope this explains this better! Cannot rule out that I'm doing sth. wrong, though.
Closed as I can no more observe that behavior. There was no chain in LC, so must have been my fault.
I already did patch langchain, that's why this problem was gone ;) Will provide a PR soon
Hi, @TomTom101! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you opened this issue suggesting improvements to the MapReduce summarization chains by avoiding an unnecessary map step when processing a single document. There was some discussion between you and ShantanuNair about whether the entire document fits in the context, and you clarified that the document is split into chunks and the single chunk easily fits into the context window.
You later closed the issue, stating that the problem was resolved and mentioned that you will provide a pull request soon.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your contribution!
This behavior is actually totally legit, map reduce is just not a good choice if one has a single document.
Thank you so much, @TomTom101, for closing this issue in LangChain! Your contribution is greatly appreciated.
When feeding the map_reduce summarization chain with a single document, the doc is run through an unnecessary map step before running a combine prompt on it. The combine prompt would imo be sufficient to avoid summarizing a single summary which then is very lossy. Great project, hope this thrives!