Llama3 context window is 8k but Langchain with Ollama shows "Token indices sequence length is longer than the specified maximum sequence length for this model (1916 > 1024). "

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.llms import Ollama
llm = Ollama(model=params['model'],num_ctx=2048)#, num_predict=1100)   

with open("./prompt.txt", encoding="utf-8") as fd:
    doc_text=fd.read()

result = llm(doc_text)
print(result)

''' Setup prompt chains '''
from langchain.chains.mapreduce import MapReduceChain
from langchain_text_splitters import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain

# Map
map_prompt = PromptTemplate.from_template(params['map_template'])
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# Reduce
reduce_prompt = PromptTemplate.from_template(params['reduce_template'])

# Run chain
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=params['reduce_token_max'],
)

# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=True,
)

print("Running map reduce summarisation...")
result = map_reduce_chain.invoke(docs)

Error Message and Stack Trace (if applicable)

Running map reduce summarisation... Token indices sequence length is longer than the specified maximum sequence length for this model (1993 > 1024). Running this sequence through the model will result in indexing errors

Description

I am running MapReduceDocumentsChain with Ollama and the "llama3" model.

Llama3 has a context window of 8k tokens. Ollama is given the argument "num_ctx = 4096".

However, the message produced indicates that the maximum sequence length being accepted is 1024. This must be happening with the ReduceDocumentsChain. Why is this the case? Any solutions? Thank you.

System Info

System Information

OS: Windows OS Version: 10.0.19045 Python Version: 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:28:07) [MSC v.1916 64 bit (AMD64)]

Package Information

langchain_core: 0.1.46 langchain: 0.1.16 langchain_community: 0.0.34 langsmith: 0.1.43 langchain_openai: 0.1.2 langchain_text_splitters: 0.0.1

langchain-ai / langchain