`LLMChainExtractor` doesn't gracefully fail on empty compressed results

ravwojdyla commented 1 year ago

System Info

langchain: 0.0.165 (and 0.0.151) python: 3.10

Who can help?

@hwchase17 @agola11

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[X] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

Use RetrievalQAWithSourcesChain with Retriever that returns some documents which the LLMChainExtractor compresses to empty strings which are filtered out for all documents, this results with a IndexError downstream from the compression.

https://github.com/hwchase17/langchain/blob/f373883c1a5f451433e7817e5092f61e7bde3f2e/langchain/retrievers/document_compressors/chain_extract.py#L54-L61

Is the relevant code ^, maybe this code should gracefully fail if the len(compressed_docs) == 0 at the end?

Error (click to unroll)

``` File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py:75, in BaseCombineDocumentsChain._call(self, inputs) 73 # Other keys are assumed to be needed for LLM prediction 74 other_keys = {k: v for k, v in inputs.items() if k != self.input_key} ---> 75 output, extra_return_dict = self.combine_docs(docs, **other_keys) 76 extra_return_dict[self.output_key] = output 77 return extra_return_dict File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/combine_documents/map_reduce.py:139, in MapReduceDocumentsChain.combine_docs(self, docs, token_max, **kwargs) 131 def combine_docs( 132 self, docs: List[Document], token_max: int = 3000, **kwargs: Any 133 ) -> Tuple[str, dict]: 134 """Combine documents in a map reduce manner. 135 136 Combine by mapping first chain over all documents, then reducing the results. 137 This reducing can be done recursively if needed (if there are many documents). 138 """ --> 139 results = self.llm_chain.apply( 140 # FYI - this is parallelized and so it is fast. 141 [{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs] 142 ) 143 return self._process_results(results, docs, token_max, **kwargs) File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/llm.py:118, in LLMChain.apply(self, input_list) 116 def apply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]: 117 """Utilize the LLM generate method for speed gains.""" --> 118 response = self.generate(input_list) 119 return self.create_outputs(response) File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/llm.py:61, in LLMChain.generate(self, input_list) 59 def generate(self, input_list: List[Dict[str, Any]]) -> LLMResult: 60 """Generate LLM result from inputs.""" ---> 61 prompts, stop = self.prep_prompts(input_list) 62 return self.llm.generate_prompt(prompts, stop) File ~/miniforge3/envs/foo/lib/python3.10/site-packages/langchain/chains/llm.py:74, in LLMChain.prep_prompts(self, input_list) 72 """Prepare prompts from inputs.""" 73 stop = None ---> 74 if "stop" in input_list[0]: 75 stop = input_list[0]["stop"] 76 prompts = [] IndexError: list index out of range ```

Expected behavior

Not fail in a cryptic way (see the error in the reproduction).

dev2049 commented 1 year ago

is it unreasonable for the compressor to filter out all documents if they're all irrelevant? should the fix live in BaseCombineDocumentsChain or in the chain apply method instead (making them able to handle empty lists)?

hwchase17 commented 1 year ago

agree with @dev2049 on this, the retriever should not error here, its reasonable for a retriever not to return any documents, we should gracefully check this and handle this downstream

ravwojdyla commented 1 year ago

@hwchase17 @dev2049 sounds good to me.

dosubot[bot] commented 1 year ago

Hi, @ravwojdyla! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you reported was about the LLMChainExtractor in the langchain library throwing an IndexError when handling empty compressed results. It seems that there was a discussion among users dev2049 and hwchase17 about whether the fix should be in BaseCombineDocumentsChain or in the chain apply method. You agreed with the proposed solution, and the issue has been resolved by modifying the code in the LLMChainExtractor to check if len(compressed_docs) == 0 and gracefully handle empty compressed results.

Before we close this issue, we would like to confirm if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you for your contribution!

Let me know if you have any questions or need further assistance.

langchain-ai / langchain