openchatai / OpenChat

LLMs custom-chatbots console ⚡
https://open.cx
MIT License
5.2k stars 641 forks source link

This model's maximum context length is 4097 tokens, however you requested 5586 tokens (5330 in your prompt; 256 for the completion). Please reduce your prompt; or completion length. #186

Open zhongchongan opened 1 year ago

zhongchongan commented 1 year ago

Helpful Answer: 2023-10-24 20:43:22 web | This model's maximum context length is 4097 tokens, however you requested 5586 tokens (5330 in your prompt; 256 for the completion). Please reduce your prompt; or completion length. 2023-10-24 20:43:22 web | Traceback (most recent call last): 2023-10-24 20:43:22 web | File "/app/api/views/views_chat.py", line 45, in chat 2023-10-24 20:43:22 web | response_text = get_completion_response(vector_store=vector_store, initial_prompt=initial_prompt,mode=mode, sanitized_question=sanitized_question, session_id=session_id) 2023-10-24 20:43:22 web | File "/app/api/views/views_chat.py", line 85, in get_completion_response 2023-10-24 20:43:22 web | response = chain({"question": sanitized_question, "chat_history": chat_history}, return_only_outputs=True) 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 258, in call 2023-10-24 20:43:22 web | raise e 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 252, in call 2023-10-24 20:43:22 web | self._call(inputs, run_manager=run_manager) 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/conversational_retrieval/base.py", line 142, in _call 2023-10-24 20:43:22 web | answer = self.combine_docs_chain.run( 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 456, in run 2023-10-24 20:43:22 web | return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 258, in call 2023-10-24 20:43:22 web | raise e 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 252, in call 2023-10-24 20:43:22 web | self._call(inputs, run_manager=run_manager) 2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/combine_documents/base.py", line 106, in _call

codebanesr commented 1 year ago

@zhongchongan We can use a different model to get around this, but i need more details on how to reproduce this.

zhongchongan commented 1 year ago

@codebanesr I use STORE=QDRANT, looking at the log, you can see that the beginning part of the original data taken out each time is related to the question I asked, but the rest of the pile is unrelated to my problem, taking out too much at one time, passing to GPT-3.5, causing the input token to exceed the limit, how to configure or optimize to reduce the original data taken out?

davidsmithdevops commented 1 year ago

@zhongchongan Reducing the number of records retrieved from Qdrant or decreasing the text segments during the ingestion process can both solve the issue