ValidationError: Input validation error: `inputs` must have less than 4096 tokens. Given: 4545 - Githubissues

meta-llama / llama

Inference code for Llama models

Other

55.78k stars 9.51k forks source link

ValidationError: Input validation error: `inputs` must have less than 4096 tokens. Given: 4545 #1103

Open asma-10 opened 5 months ago

asma-10 commented 5 months ago

Describe the bug

i was using meta-llama/Llama-2-7b-chat-hf from hugging face in a RAG model and it used to work perfectly, bur then i suddenly recieved this error :

HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b-chat-hf (Request ID: gPxf6Ns0plH9zveHLZP_A)

Input validation error: `inputs` must have less than 4096 tokens. Given: 4545
Make sure 'text-generation' task is supported by the model.

this is the code i used :

llm = HuggingFaceInferenceAPI(model_name="meta-llama/Llama-2-7b-chat-hf", api_key=hf_token)
rerank = SentenceTransformerRerank(
    model="BAAI/bge-reranker-v2-m3", top_n=4
)
bm25_retriever = BM25Retriever.from_defaults(index=index, similarity_top_k=10)
query_engine = RetrieverQueryEngine.from_args(
    retriever=bm25_retriever,
    llm=llm,
    node_postprocessors=[rerank]
)

Runtime Environment

Model: llama-2-7b-chat-hf, llama-2-7b-hf
Using via huggingface?: yes
OS: Windows
GPU VRAM: colab's GPU