Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.17
Python version: 3.11.4 (main, Jul 5 2023, 14:15:25) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.105.1.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB
Nvidia driver version: 535.54.03
cuDNN version: Probably one of the following:
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8.9.7
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen2-72B-Instruct-GPTQ-Int4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me something about large language models."}
],
}'
Error messages:
INFO: 10.247.197.186:61063 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/anaconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/fastapi/routing.py", line 269, in app
solved_result = await solve_dependencies(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 628, in solve_dependencies
) = await request_body_to_args( # body_params checked above
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 758, in request_body_toargs
v, errors_ = field.validate(value, values, loc=loc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/fastapi/_compat.py", line 127, in validate
self._type_adapter.validate_python(value, from_attributes=True),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/pydantic/type_adapter.py", line 258, in validate_python
return self.validator.validate_python(__object, strict=strict, from_attributes=from_attributes, context=context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/anaconda3/lib/python3.11/site-packages/vllm/entrypoints/openai/protocol.py", line 256, in check_logprobs
if "top_logprobs" in data and data["top_logprobs"] is not None:
^^^^^^^^^^^^^^^^^^^^^^
TypeError: a bytes-like object is required, not 'str'
Your current environment
Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A
OS: CentOS Linux 7 (Core) (x86_64) GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) Clang version: Could not collect CMake version: version 3.29.2 Libc version: glibc-2.17
Python version: 3.11.4 (main, Jul 5 2023, 14:15:25) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-3.10.0-1160.105.1.el7.x86_64-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB Nvidia driver version: 535.54.03 cuDNN version: Probably one of the following: /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz Stepping: 6 CPU MHz: 2899.998 BogoMIPS: 5799.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 48K L1i cache: 32K L2 cache: 1280K L3 cache: 49152K NUMA node0 CPU(s): 0-15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd rsb_ctxsw ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512vbmi avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq spec_ctrl intel_stibp arch_capabilities
Versions of relevant libraries: [pip3] flake8==6.0.0 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.5 [pip3] numpydoc==1.5.0 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] sentence-transformers==2.5.1 [pip3] torch==2.3.0 [pip3] torchaudio==2.1.2 [pip3] torchvision==0.18.0 [pip3] transformers==4.38.2 [pip3] transformers-stream-generator==0.0.4 [pip3] triton==2.3.0 [conda] blas 1.0 mkl defaults [conda] mkl 2023.1.0 h6d00ec8_46342 defaults [conda] mkl-service 2.4.0 py311h5eee18b_1 defaults [conda] mkl_fft 1.3.6 py311ha02d727_1 defaults [conda] mkl_random 1.2.2 py311ha02d727_1 defaults [conda] numpy 1.23.5 pypi_0 pypi [conda] numpydoc 1.5.0 py311h06a4308_0 defaults [conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi [conda] sentence-transformers 2.5.1 pypi_0 pypi [conda] torch 2.3.0 pypi_0 pypi [conda] torchaudio 2.1.2 pypi_0 pypi [conda] torchvision 0.18.0 pypi_0 pypi [conda] transformers 4.38.2 pypi_0 pypi [conda] transformers-stream-generator 0.0.4 pypi_0 pypi [conda] triton 2.3.0 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.4.3 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X 0-15 N/A N/A
🐛 Describe the bug
Launch command:
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-72B-Instruct-GPTQ-Int4
Request:
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen2-72B-Instruct-GPTQ-Int4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me something about large language models."} ], }'
Error messages:
INFO: 10.247.197.186:61063 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/app/anaconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/fastapi/routing.py", line 269, in app solved_result = await solve_dependencies( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 628, in solve_dependencies ) = await request_body_to_args( # body_params checked above ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 758, in request_body_toargs v, errors_ = field.validate(value, values, loc=loc) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/fastapi/_compat.py", line 127, in validate self._type_adapter.validate_python(value, from_attributes=True), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/pydantic/type_adapter.py", line 258, in validate_python return self.validator.validate_python(__object, strict=strict, from_attributes=from_attributes, context=context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/anaconda3/lib/python3.11/site-packages/vllm/entrypoints/openai/protocol.py", line 256, in check_logprobs if "top_logprobs" in data and data["top_logprobs"] is not None: ^^^^^^^^^^^^^^^^^^^^^^ TypeError: a bytes-like object is required, not 'str'