lamm-mit / PDF2Audio

Apache License 2.0
796 stars 86 forks source link

Problems running locally on Windows 11 #10

Open moebiussurfing opened 2 hours ago

moebiussurfing commented 2 hours ago

Hello @lamm-mit , I am getting this error on the llama server. This is after submitting and setling the web server UI as mentioned in other issues here: Text Generation Model: openai/custom_model Custom Base API: http://localhost:8888/v1

INFO:     127.0.0.1:55176 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Exception: Requested tokens (8031) exceed context window of 2048
Traceback (most recent call last):
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 297, in app
    raw_response = await run_endpoint_function(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\fastapi\routing.py", line 210, in run_endpoint_function
    return await dependant.call(**values)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\server\app.py", line 513, in create_chat_completion
    ] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\starlette\concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1997, in create_chat_completion
    return handler(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama_chat_format.py", line 637, in chat_completion_handler
    completion_or_chunks = llama.create_completion(
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1831, in create_completion
    completion: Completion = next(completion_or_chunks)  # type: ignore
  File "C:\Users\moebi\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 1267, in _create_completion
    raise ValueError(
ValueError: Requested tokens (8031) exceed context window of 2048
INFO:     127.0.0.1:55189 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Any help?

PS I noticed that I am using Python 3.10 but install info pointed to 3.9 conda create -n pdf2audio python=3.9