I installed ochat into new venv with pip3 install ochat.
Then I run the server with python -m ochat.serving.openai_api_server --model openchat/openchat-3.6-8b-20240522 --model-type openchat_3.6
However, when trying curl curl http://localhost:18888/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "openchat_3.6", "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}] }'
I get Internal server error. Server prints this:
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/vojta/ochat/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/vojta/ochat/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/vojta/ochat/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/home/vojta/ochat/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/home/vojta/ochat/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/home/vojta/ochat/lib/python3.10/site-packages/ochat/serving/openai_api_server.py", line 188, in create_chat_completion
result_generator = engine.generate(prompt=None,
TypeError: AsyncLLMEngine.generate() got an unexpected keyword argument 'prompt'
any idea where the problem might be?
Running on RTX3090:
Mon Jun 10 11:41:19 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:4C:00.0 Off | N/A |
| 0% 42C P8 18W / 350W | 21469MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 10265 C python 21452MiB |
+---------------------------------------------------------------------------------------+
I had a similar issue. It looks like it's due to a change in the VLLM AsyncLLMEngine.generate() API from v0.4.2 to v0.4.3 (prompt was deprecated by inputs). Downgrading VLLM fixed the issue for me.
I installed ochat into new venv with
pip3 install ochat
.Then I run the server with
python -m ochat.serving.openai_api_server --model openchat/openchat-3.6-8b-20240522 --model-type openchat_3.6
However, when trying curl
curl http://localhost:18888/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "openchat_3.6", "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}] }'
I get Internal server error. Server prints this:
any idea where the problem might be?
Running on RTX3090: