langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
91.14k stars 14.49k forks source link

load_summarize_chain with map_reduce error : InvalidRequestError: This model's maximum context length is 8192 tokens. However, your messages resulted in 13516 tokens. Please reduce the length of the messages. #13230

Closed manasatramalinga closed 6 months ago

manasatramalinga commented 9 months ago

System Info

from langchain.text_splitter import CharacterTextSplitter from langchain.docstore.document import Document from langchain.chains.summarize import load_summarize_chain from fastapi.encoders import jsonable_encoder from langchain.chains.mapreduce import MapReduceChain from time import monotonic

gpt_4_8k_max_tokens = 8000 #https://platform.openai.com/docs/models/gpt-4 text_splitter = CharacterTextSplitter.from_tiktoken_encoder(model_name=model_name, chunk_size=gpt_4_8k_max_tokens, chunk_overlap=0) verbose = False

Initialize output dataframe with all the columns in the patient history class

column_names = list(PatientHistory.model_fields.keys()) df_AOAI_extracted_text = pd.DataFrame(columns=column_names)

Create documents from the input text

texts = text_splitter.split_text(test_text) docs = [Document(page_content=t) for t in texts] print(f"Number of Documents {len(docs)}")

Count the number of tokens in the document

num_tokens = num_tokens_from_string(test_text, model_name) print(f"Number of Tokens {num_tokens}")

call langchain summarizer to get the output for the given prompt

summaries = [] if num_tokens < gpt_4_8k_max_tokens:

Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

This method is sutable for smaller piece of data.

chain = load_summarize_chain(llm, chain_type="stuff", prompt=TABLE_PROMPT, verbose=verbose)

else:

MapReduceDocumentsChain is an advanced document processing technique that extends the capabilities of the conventional MapReduce framework.

    #It goes beyond the typical MapReduce approach by executing a distinct prompt to consolidate the initial outputs. 
    #This method is designed to generate a thorough and cohesive summary or response that encompasses the entire document.
print('mapreduce')
chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=TABLE_PROMPT, combine_prompt=TABLE_PROMPT, verbose=verbose, return_intermediate_steps=False)

start_time = monotonic() summary = chain.run(docs) print(summary)

Who can help?

No response

Information

Related Components

Reproduction

from langchain.text_splitter import CharacterTextSplitter from langchain.docstore.document import Document from langchain.chains.summarize import load_summarize_chain from fastapi.encoders import jsonable_encoder from langchain.chains.mapreduce import MapReduceChain from time import monotonic

gpt_4_8k_max_tokens = 8000 #https://platform.openai.com/docs/models/gpt-4 text_splitter = CharacterTextSplitter.from_tiktoken_encoder(model_name=model_name, chunk_size=gpt_4_8k_max_tokens, chunk_overlap=0) verbose = False

Initialize output dataframe with all the columns in the patient history class

column_names = list(PatientHistory.model_fields.keys()) df_AOAI_extracted_text = pd.DataFrame(columns=column_names)

Create documents from the input text

texts = text_splitter.split_text(test_text) docs = [Document(page_content=t) for t in texts] print(f"Number of Documents {len(docs)}")

Count the number of tokens in the document

num_tokens = num_tokens_from_string(test_text, model_name) print(f"Number of Tokens {num_tokens}")

call langchain summarizer to get the output for the given prompt

summaries = [] if num_tokens < gpt_4_8k_max_tokens:

Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

This method is sutable for smaller piece of data.

chain = load_summarize_chain(llm, chain_type="stuff", prompt=TABLE_PROMPT, verbose=verbose)

else:

MapReduceDocumentsChain is an advanced document processing technique that extends the capabilities of the conventional MapReduce framework.

    #It goes beyond the typical MapReduce approach by executing a distinct prompt to consolidate the initial outputs. 
    #This method is designed to generate a thorough and cohesive summary or response that encompasses the entire document.
print('mapreduce')
chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=TABLE_PROMPT, combine_prompt=TABLE_PROMPT, verbose=verbose, return_intermediate_steps=False)

start_time = monotonic() summary = chain.run(docs) print(summary)

Expected behavior

Should go through all docs and provide the summary

dosubot[bot] commented 9 months ago

🤖

Hello @manasatramalinga,

I'm a bot here to assist you with your issue while we await a human maintainer. I can help you solve bugs, answer questions, and even guide you on how to become a contributor. Rest assured, I'm currently reviewing the problem you're facing with the load_summarize_chain and the map_reduce error and will get back to you with a full answer shortly.

dosubot[bot] commented 9 months ago

🤖

Based on the error message you're receiving, it seems that the total number of tokens in your documents is exceeding the maximum context length of the language model you're using. The maximum context length for a model is the maximum number of tokens that the model can handle at once. If the total number of tokens in your documents exceeds this limit, you'll receive an InvalidRequestError.

In your case, you're using the MapReduceChain to process your documents. This chain splits your documents into smaller chunks, processes each chunk with the language model, and then combines the results. However, if the chunks are too large for the language model to handle, you'll get an error about exceeding the model's maximum context length.

To resolve this issue, you can adjust the CharacterTextSplitter to create smaller chunks of text. You can do this by reducing the chunk_size parameter when you create the CharacterTextSplitter object. Here's how you can do it:

# Reduce the chunk size to a smaller value
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(model_name=model_name, chunk_size=4000, chunk_overlap=0)

In the above code, I've reduced the chunk_size to 4000. You might need to adjust this value based on the maximum context length of your language model.

Additionally, you can also specify the maximum number of tokens to construct with the token_max parameter in the run method of the summary chain. For example, summary_chain.run(input_documents=docs, token_max=2000). If you want to automatically use the maximum context size, you can set token_max to -1.

Here's how you can do it:

# Specify the maximum number of tokens to construct
summary = chain.run(docs, token_max=2000)

In the above code, I've set the token_max to 2000. You might need to adjust this value based on the maximum context length of your language model.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 6 months ago

Hi, @manasatramalinga! I'm helping the LangChain team manage their backlog and am marking this issue as stale.

It looks like you raised an issue regarding the load_summarize_chain function encountering an error due to exceeding the maximum context length of 8192 tokens. I provided a detailed response, explaining that the error is likely due to the total number of tokens in the documents exceeding the language model's maximum context length. I suggested adjusting the CharacterTextSplitter to create smaller text chunks and specifying the maximum number of tokens to construct in the run method of the summary chain. The response included code examples and relevant sources for further reference.

Is this issue still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution! If you have any further questions or need assistance, feel free to reach out.