Hello @lamm-mit ,
I am getting this error on the llama server.
This is after submitting and setling the web server UI
as mentioned in other issues here:
Text Generation Model: openai/custom_model
Custom Base API: http://localhost:8888/v1
INFO: 127.0.0.1:55176 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (8031) exceed context window of 2048
Traceback (most recent call last):
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\errors.py", line 171, in custom_route_handler
response = await original_route_handler(request)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 297, in app
raw_response = await run_endpoint_function(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 210, in run_endpoint_function
return await dependant.call(**values)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\app.py", line 513, in create_chat_completion
] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1997, in create_chat_completion
return handler(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama_chat_format.py", line 637, in chat_completion_handler
completion_or_chunks = llama.create_completion(
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1831, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1267, in _create_completion
raise ValueError(
ValueError: Requested tokens (8031) exceed context window of 2048
INFO: 127.0.0.1:55189 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Any help?
PS
I noticed that I am using Python 3.10 but install info pointed to 3.9
conda create -n pdf2audio python=3.9
Hello @lamm-mit , I am getting this error on the llama server. This is after submitting and setling the web server UI as mentioned in other issues here: Text Generation Model: openai/custom_model Custom Base API: http://localhost:8888/v1
Any help?
PS I noticed that I am using Python 3.10 but install info pointed to 3.9 conda create -n pdf2audio python=3.9