vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.79k stars 3.41k forks source link

[Bug]: TypeError: a bytes-like object is required, not 'str' #5440

Open yaoyasong opened 1 month ago

yaoyasong commented 1 month ago

Your current environment

Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64) GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) Clang version: Could not collect CMake version: version 3.29.2 Libc version: glibc-2.17

Python version: 3.11.4 (main, Jul  5 2023, 14:15:25) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-3.10.0-1160.105.1.el7.x86_64-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB Nvidia driver version: 535.54.03 cuDNN version: Probably one of the following: /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7 /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture:          x86_64 CPU op-mode(s):        32-bit, 64-bit Byte Order:            Little Endian CPU(s):                16 On-line CPU(s) list:   0-15 Thread(s) per core:    2 Core(s) per socket:    8 Socket(s):             1 NUMA node(s):          1 Vendor ID:             GenuineIntel CPU family:            6 Model:                 106 Model name:            Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz Stepping:              6 CPU MHz:               2899.998 BogoMIPS:              5799.99 Hypervisor vendor:     KVM Virtualization type:   full L1d cache:             48K L1i cache:             32K L2 cache:              1280K L3 cache:              49152K NUMA node0 CPU(s):     0-15 Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd rsb_ctxsw ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512vbmi avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq spec_ctrl intel_stibp arch_capabilities

Versions of relevant libraries: [pip3] flake8==6.0.0 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.5 [pip3] numpydoc==1.5.0 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] sentence-transformers==2.5.1 [pip3] torch==2.3.0 [pip3] torchaudio==2.1.2 [pip3] torchvision==0.18.0 [pip3] transformers==4.38.2 [pip3] transformers-stream-generator==0.0.4 [pip3] triton==2.3.0 [conda] blas                      1.0                         mkl    defaults [conda] mkl                       2023.1.0         h6d00ec8_46342    defaults [conda] mkl-service               2.4.0           py311h5eee18b_1    defaults [conda] mkl_fft                   1.3.6           py311ha02d727_1    defaults [conda] mkl_random                1.2.2           py311ha02d727_1    defaults [conda] numpy                     1.23.5                   pypi_0    pypi [conda] numpydoc                  1.5.0           py311h06a4308_0    defaults [conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi [conda] sentence-transformers     2.5.1                    pypi_0    pypi [conda] torch                     2.3.0                    pypi_0    pypi [conda] torchaudio                2.1.2                    pypi_0    pypi [conda] torchvision               0.18.0                   pypi_0    pypi [conda] transformers              4.38.2                   pypi_0    pypi [conda] transformers-stream-generator 0.0.4                    pypi_0    pypi [conda] triton                    2.3.0                    pypi_0    pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.4.3 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID GPU0     X      0-15            N/A             N/A

🐛 Describe the bug

Launch command:

python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-72B-Instruct-GPTQ-Int4

Request:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen2-72B-Instruct-GPTQ-Int4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me something about large language models."} ], }'

Error messages:

INFO:     10.247.197.186:61063 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR:    Exception in ASGI application Traceback (most recent call last):   File "/app/anaconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi     result = await app(  # type: ignore[func-returns-value]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call     return await self.app(scope, receive, send)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call     await super().call(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call     await self.middleware_stack(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call     raise exc   File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call     await self.app(scope, receive, _send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in call     await self.app(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app     raise exc   File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app     await app(scope, receive, sender)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call     await self.middleware_stack(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app     await route.handle(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle     await self.app(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app     await wrap_app_handling_exceptions(app, request)(scope, receive, send)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app     raise exc   File "/app/anaconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app     await app(scope, receive, sender)   File "/app/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 72, in app     response = await func(request)                ^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/fastapi/routing.py", line 269, in app     solved_result = await solve_dependencies(                     ^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 628, in solve_dependencies     ) = await request_body_to_args(  # body_params checked above         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 758, in request_body_toargs     v, errors_ = field.validate(value, values, loc=loc)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/fastapi/_compat.py", line 127, in validate     self._type_adapter.validate_python(value, from_attributes=True),     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/pydantic/type_adapter.py", line 258, in validate_python     return self.validator.validate_python(__object, strict=strict, from_attributes=from_attributes, context=context)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/app/anaconda3/lib/python3.11/site-packages/vllm/entrypoints/openai/protocol.py", line 256, in check_logprobs     if "top_logprobs" in data and data["top_logprobs"] is not None:        ^^^^^^^^^^^^^^^^^^^^^^ TypeError: a bytes-like object is required, not 'str'

wz1714748313 commented 1 month ago

请问解决了吗

yaoyasong commented 1 month ago

请问解决了吗

没有,感觉是vllm提供的api_server有些版本不兼容的情况,没仔细研究 现在是使用xinference推理框架解决了

bbss commented 4 days ago

In my case this happened because my client wasn't including "Content-Type" "application/json" headers.