Closed ravwojdyla closed 1 year ago
is it unreasonable for the compressor to filter out all documents if they're all irrelevant? should the fix live in BaseCombineDocumentsChain or in the chain apply method instead (making them able to handle empty lists)?
agree with @dev2049 on this, the retriever should not error here, its reasonable for a retriever not to return any documents, we should gracefully check this and handle this downstream
@hwchase17 @dev2049 sounds good to me.
Hi, @ravwojdyla! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, the issue you reported was about the LLMChainExtractor
in the langchain
library throwing an IndexError
when handling empty compressed results. It seems that there was a discussion among users dev2049
and hwchase17
about whether the fix should be in BaseCombineDocumentsChain
or in the chain apply method. You agreed with the proposed solution, and the issue has been resolved by modifying the code in the LLMChainExtractor
to check if len(compressed_docs) == 0
and gracefully handle empty compressed results.
Before we close this issue, we would like to confirm if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you for your contribution!
Let me know if you have any questions or need further assistance.
System Info
langchain: 0.0.165 (and 0.0.151) python: 3.10
Who can help?
@hwchase17 @agola11
Information
Related Components
Reproduction
Use
RetrievalQAWithSourcesChain
withRetriever
that returns some documents which theLLMChainExtractor
compresses to empty strings which are filtered out for all documents, this results with aIndexError
downstream from the compression.https://github.com/hwchase17/langchain/blob/f373883c1a5f451433e7817e5092f61e7bde3f2e/langchain/retrievers/document_compressors/chain_extract.py#L54-L61
Is the relevant code ^, maybe this code should gracefully fail if the
len(compressed_docs) == 0
at the end?Error (click to unroll)
``` File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py:75, in BaseCombineDocumentsChain._call(self, inputs) 73 # Other keys are assumed to be needed for LLM prediction 74 other_keys = {k: v for k, v in inputs.items() if k != self.input_key} ---> 75 output, extra_return_dict = self.combine_docs(docs, **other_keys) 76 extra_return_dict[self.output_key] = output 77 return extra_return_dict File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py:139, in MapReduceDocumentsChain.combine_docs(self, docs, token_max, **kwargs) 131 def combine_docs( 132 self, docs: List[Document], token_max: int = 3000, **kwargs: Any 133 ) -> Tuple[str, dict]: 134 """Combine documents in a map reduce manner. 135 136 Combine by mapping first chain over all documents, then reducing the results. 137 This reducing can be done recursively if needed (if there are many documents). 138 """ --> 139 results = self.llm_chain.apply( 140 # FYI - this is parallelized and so it is fast. 141 [{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs] 142 ) 143 return self._process_results(results, docs, token_max, **kwargs) File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/llm.py:118, in LLMChain.apply(self, input_list) 116 def apply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]: 117 """Utilize the LLM generate method for speed gains.""" --> 118 response = self.generate(input_list) 119 return self.create_outputs(response) File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/llm.py:61, in LLMChain.generate(self, input_list) 59 def generate(self, input_list: List[Dict[str, Any]]) -> LLMResult: 60 """Generate LLM result from inputs.""" ---> 61 prompts, stop = self.prep_prompts(input_list) 62 return self.llm.generate_prompt(prompts, stop) File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/llm.py:74, in LLMChain.prep_prompts(self, input_list) 72 """Prepare prompts from inputs.""" 73 stop = None ---> 74 if "stop" in input_list[0]: 75 stop = input_list[0]["stop"] 76 prompts = [] IndexError: list index out of range ```Expected behavior
Not fail in a cryptic way (see the error in the reproduction).