Closed debrupf2946 closed 4 months ago
🤖
Hello @debrupf2946,
Welcome to the LangChain repository! Just to introduce myself, I'm Dosu, a bot designed to assist you with your queries, help tackle bugs, and guide you on your journey to becoming a contributor.
While we wait for human maintainers, I'm here to provide initial assistance. I'm currently reviewing your issue and will get back to you with a comprehensive response in a short while.
🤖
Based on the code you've shared and the error you're encountering, it seems like the issue is occurring in the _reduce_tokens_below_limit
method of the RetrievalQAWithSourcesChain
class. This method is responsible for reducing the number of documents based on the max_tokens_limit
and it seems like it's trying to access an index in the tokens
list that doesn't exist.
This could happen if the num_docs
variable is greater than the length of the tokens
list. The num_docs
variable is initialized with the length of the docs
list, but it's not guaranteed that the tokens
list will have the same length.
The tokens
list is created by calling _get_num_tokens
on each document's page_content
. If any document's page_content
is None
or doesn't exist, _get_num_tokens
might not add an item to the tokens
list, making it shorter than the docs
list.
To fix this issue, you should ensure that the tokens
list always has the same length as the docs
list. You can do this by handling the case where a document's page_content
is None
or doesn't exist in the list comprehension that creates the tokens
list.
Here's how you can modify the _reduce_tokens_below_limit
method to fix the issue:
def _reduce_tokens_below_limit(self, docs: List[Document]) -> List[Document]:
num_docs = len(docs)
if self.reduce_k_below_max_tokens and isinstance(
self.combine_documents_chain, StuffDocumentsChain
):
tokens = [
self.combine_documents_chain.llm_chain._get_num_tokens(doc.page_content)
if doc.page_content is not None else 0
for doc in docs
]
token_count = sum(tokens[:num_docs])
while token_count > self.max_tokens_limit:
num_docs -= 1
token_count -= tokens[num_docs]
return docs[:num_docs]
In this modified version, if a document's page_content
is None
, _get_num_tokens
won't be called and a 0 will be added to the tokens
list instead. This ensures that the tokens
list always has the same length as the docs
list, preventing the "IndexError('list index out of range')" error.
You can find the original implementation of this method in the libs/langchain/langchain/chains/qa_with_sources/retrieval.py file in the LangChain repository.
Please try this solution and let me know if it resolves your issue.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Did you resolve this issue?
@MuhammedAjmalG, did you have any work arounds for this?
@kesavan22
yes, Instead of RetrievalQAWithSourcesChain you can use RetrievalQA from Langchain !
@MuhammedAjmalG , Thanks for responding but I still could see the same issue
@kesavan22 without seeing the whole code i can't say anything on that, if you want you can check my github repo. github link : https://github.com/MuhammedAjmalG/llm_langchain_learning/tree/main/llm_lang_url_app
Thank you!!!
I solved using this:
from langchain.chains import RetrievalQA
retriever = vectorIndex.as_retriever()
chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=retriever, input_key="query", return_source_documents=True)
query = "Your query" chain(query) # Gives correct answer
Issue you'd like to raise.
``I am using Google Palm,Faiss,HF Instruct Embeddings.When ever I am Quering with RetrievalQAWithSourcesChain. I am getting
Here is my whole code