langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.2k stars 15.23k forks source link

Issue: <[chain/error] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 9:chain:LLMChain] [2.01s] Chain run errored with error: "IndexError('list index out of range')"> #12903

Closed debrupf2946 closed 4 months ago

debrupf2946 commented 12 months ago

Issue you'd like to raise.

``I am using Google Palm,Faiss,HF Instruct Embeddings.When ever I am Quering with RetrievalQAWithSourcesChain. I am getting

 [chain/error] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 9:chain:LLMChain] [2.01s] Chain run errored with error:
"IndexError('list index out of range')"
[chain/error] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain] [6.45s] Chain run errored with error:
"IndexError('list index out of range')"
[chain/error] [1:chain:RetrievalQAWithSourcesChain] [7.13s] Chain run errored with error:
"IndexError('list index out of range')"

Here is my whole code

import` os
import streamlit as st
import pickle
import time
import langchain
from langchain.llms import GooglePalm
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.qa_with_sources.loading import load_qa_with_sources_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import SeleniumURLLoader
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.vectorstores import FAISS

urls = ["https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS",
        "https://www.livemint.com/companies/news/tata-motorss-ev-subsidiary-to-sign-mou-with-jlr-to-strengthen-ev-business-tata-motors-to-pay-royalty-fee-to-jlr-11698925980903.html"
]

loader=SeleniumURLLoader(urls=urls)
data=loader.load()
llm=GooglePalm(google_api_key="", temperature=0.9,max_output_tokens=500)

r_splitter = RecursiveCharacterTextSplitter(
        separators=["\n\n", "\n", " ","\t\t"],
        chunk_size=400,
        chunk_overlap=80
    )
docs = r_splitter.split_documents(data)

embeddings = HuggingFaceInstructEmbeddings(
    query_instruction="Represent the query for retrieval: "
)
vector_index = FAISS.from_documents(docs, embeddings)

chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vector_index.as_retriever())
chain
langchain.debug=True
query="summerise the text"
chain({"question":query},return_only_outputs=True)


OUTPUT:
`[chain/start] [1:chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
{
  "question": "summerise the text"
}
[chain/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
{
  "input_list": [
    {
      "context": "View more \n                                                                            \n                                    \n                                        Posted by : kamal20",
      "question": "summerise the text"
    },
    {
      "context": "View more \n                                                                            \n                                    \n                                        Posted by : kamal20",
      "question": "summerise the text"
    },
    {
      "context": "- - \n                           - - \n                           - - \n                           - - \n                           - -",
      "question": "summerise the text"
    },
    {
      "context": "- - \n                           - - \n                           - - \n                           - - \n                           - -",
      "question": "summerise the text"
    }
  ]
}
[llm/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 5:llm:GooglePalm] Entering LLM run with input:
{
  "prompts": [
    "Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\nView more \n                                                                            \n                                    \n                                        Posted by : kamal20\nQuestion: summerise the text\nRelevant text, if any:"
  ]
}
[llm/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 6:llm:GooglePalm] Entering LLM run with input:
{
  "prompts": [
    "Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\nView more \n                                                                            \n                                    \n                                        Posted by : kamal20\nQuestion: summerise the text\nRelevant text, if any:"
  ]
}
[llm/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 7:llm:GooglePalm] Entering LLM run with input:
{
  "prompts": [
    "Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\n- - \n                           - - \n                           - - \n                           - - \n                           - -\nQuestion: summerise the text\nRelevant text, if any:"
  ]
}
[llm/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 8:llm:GooglePalm] Entering LLM run with input:
{
  "prompts": [
    "Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\n- - \n                           - - \n                           - - \n                           - - \n                           - -\nQuestion: summerise the text\nRelevant text, if any:"
  ]
}
[llm/end] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 5:llm:GooglePalm] [4.40s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "This article discusses the concept of summarizing documents and provides a detailed overview of the steps involved in the process. It also provides examples of summaries and discusses the different types of summaries that can be created.",
        "generation_info": null
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 6:llm:GooglePalm] [4.40s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "The main points are:",
        "generation_info": null
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 7:llm:GooglePalm] [4.40s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "- - \n\n- - ",
        "generation_info": null
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain > 8:llm:GooglePalm] [4.40s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "- - ",
        "generation_info": null
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[chain/end] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 4:chain:LLMChain] [4.41s] Exiting Chain run with output:
{
  "outputs": [
    {
      "text": "This article discusses the concept of summarizing documents and provides a detailed overview of the steps involved in the process. It also provides examples of summaries and discusses the different types of summaries that can be created."
    },
    {
      "text": "The main points are:"
    },
    {
      "text": "- - \n\n- - "
    },
    {
      "text": "- - "
    }
  ]
}
[chain/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 9:chain:LLMChain] Entering Chain run with input:
{
  "question": "summerise the text",
  "summaries": "Content: This article discusses the concept of summarizing documents and provides a detailed overview of the steps involved in the process. It also provides examples of summaries and discusses the different types of summaries that can be created.\nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n\nContent](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n/nContent): The main points are:\nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n\nContent](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n/nContent): - - \n\n- - \nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n\nContent](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n/nContent): - - \nSource: https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS"
}
[llm/start] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 9:chain:LLMChain > 10:llm:GooglePalm] Entering LLM run with input:
{
  "prompts": [
    "Given the following extracted parts of a long document and a question, create a final answer with references (\"SOURCES\"). \nIf you don't know the answer, just say that you don't know. Don't try to make up an answer.\nALWAYS return a \"SOURCES\" part in your answer.\n\nQUESTION: Which state/country's law governs the interpretation of the contract?\n=========\nContent: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.\nSource: 28-pl\nContent: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any  kind between the parties.\n\n11.9 No Third-Party Beneficiaries.\nSource: 30-pl\nContent: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as  defined in Clause 8.5) or that such a violation is reasonably likely to occur,\nSource: 4-pl\n=========\nFINAL ANSWER: This Agreement is governed by English law.\nSOURCES: 28-pl\n\nQUESTION: What did the president say about Michael Jackson?\n=========\nContent: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.\nSource: 0-pl\nContent: And we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.\nSource: 24-pl\nContent: And a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.  \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.  \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay.\nSource: 5-pl\nContent: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.  \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.\nSource: 34-pl\n=========\nFINAL ANSWER: The president did not mention Michael Jackson.\nSOURCES:\n\nQUESTION: summerise the text\n=========\nContent: This article discusses the concept of summarizing documents and provides a detailed overview of the steps involved in the process. It also provides examples of summaries and discusses the different types of summaries that can be created.\nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n\nContent](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n/nContent): The main points are:\nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n\nContent](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n/nContent): - - \n\n- - \nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n\nContent](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n/nContent): - - \nSource: [https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS\n=========\nFINAL](https://www.moneycontrol.com/india/stockpricequote/ironsteel/tatasteel/TIS/n=========/nFINAL) ANSWER:"
  ]
}
[llm/end] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 9:chain:LLMChain > 10:llm:GooglePalm] [2.01s] Exiting LLM run with output:
{
  "generations": [
    []
  ],
  "llm_output": null,
  "run": null
}
[chain/error] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain > 9:chain:LLMChain] [2.01s] Chain run errored with error:
"IndexError('list index out of range')"
[chain/error] [1:chain:RetrievalQAWithSourcesChain > 3:chain:MapReduceDocumentsChain] [6.45s] Chain run errored with error:
"IndexError('list index out of range')"
[chain/error] [1:chain:RetrievalQAWithSourcesChain] [7.13s] Chain run errored with error:
"IndexError('list index out of range')"`

### Suggestion:

_No response_
dosubot[bot] commented 12 months ago

🤖

Hello @debrupf2946,

Welcome to the LangChain repository! Just to introduce myself, I'm Dosu, a bot designed to assist you with your queries, help tackle bugs, and guide you on your journey to becoming a contributor.

While we wait for human maintainers, I'm here to provide initial assistance. I'm currently reviewing your issue and will get back to you with a comprehensive response in a short while.

dosubot[bot] commented 12 months ago

🤖

Based on the code you've shared and the error you're encountering, it seems like the issue is occurring in the _reduce_tokens_below_limit method of the RetrievalQAWithSourcesChain class. This method is responsible for reducing the number of documents based on the max_tokens_limit and it seems like it's trying to access an index in the tokens list that doesn't exist.

This could happen if the num_docs variable is greater than the length of the tokens list. The num_docs variable is initialized with the length of the docs list, but it's not guaranteed that the tokens list will have the same length.

The tokens list is created by calling _get_num_tokens on each document's page_content. If any document's page_content is None or doesn't exist, _get_num_tokens might not add an item to the tokens list, making it shorter than the docs list.

To fix this issue, you should ensure that the tokens list always has the same length as the docs list. You can do this by handling the case where a document's page_content is None or doesn't exist in the list comprehension that creates the tokens list.

Here's how you can modify the _reduce_tokens_below_limit method to fix the issue:

def _reduce_tokens_below_limit(self, docs: List[Document]) -> List[Document]:
    num_docs = len(docs)

    if self.reduce_k_below_max_tokens and isinstance(
        self.combine_documents_chain, StuffDocumentsChain
    ):
        tokens = [
            self.combine_documents_chain.llm_chain._get_num_tokens(doc.page_content)
            if doc.page_content is not None else 0
            for doc in docs
        ]
        token_count = sum(tokens[:num_docs])
        while token_count > self.max_tokens_limit:
            num_docs -= 1
            token_count -= tokens[num_docs]

    return docs[:num_docs]

In this modified version, if a document's page_content is None, _get_num_tokens won't be called and a 0 will be added to the tokens list instead. This ensures that the tokens list always has the same length as the docs list, preventing the "IndexError('list index out of range')" error.

You can find the original implementation of this method in the libs/langchain/langchain/chains/qa_with_sources/retrieval.py file in the LangChain repository.

Please try this solution and let me know if it resolves your issue.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

MuhammedAjmalG commented 10 months ago

Did you resolve this issue?

kesavan22 commented 9 months ago

@MuhammedAjmalG, did you have any work arounds for this?

MuhammedAjmalG commented 9 months ago

@kesavan22

yes, Instead of RetrievalQAWithSourcesChain you can use RetrievalQA from Langchain !

kesavan22 commented 9 months ago

@MuhammedAjmalG , Thanks for responding but I still could see the same issue

image

MuhammedAjmalG commented 9 months ago

@kesavan22 without seeing the whole code i can't say anything on that, if you want you can check my github repo. github link : https://github.com/MuhammedAjmalG/llm_langchain_learning/tree/main/llm_lang_url_app

kesavan22 commented 9 months ago

Thank you!!!

harshintsft commented 7 months ago

I solved using this:

from langchain.chains import RetrievalQA

retriever = vectorIndex.as_retriever()

chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=retriever, input_key="query", return_source_documents=True)

query = "Your query" chain(query) # Gives correct answer