Closed navpreet-np7 closed 6 months ago
I also encountered the same error.
Me too
Maybe it overlaps with num_speculative_tokens in the dictionary?
I added num_lookahead_slots under https://github.com/vllm-project/vllm/blob/f8e7adda21810104382bdf3febe3ea02c72f7348/vllm/worker/cpu_worker.py#L257
now I am hitting this bug: https://github.com/vllm-project/vllm/issues/4229
I am able to run my model on CPU now with the fix
@peterauyeung I am getting the following new error after applying the above fix
INFO 05-08 16:49:17 async_llm_engine.py:526] Received request cmpl-e4c495701dca4e13a678f65d19febf49-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 24661, 13175, 374, 264], lora_request: None.
INFO 05-08 16:49:17 async_llm_engine.py:154] Aborted request cmpl-e4c495701dca4e13a678f65d19febf49-0.
INFO: 127.0.0.1:54576 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion
generator = await openai_serving_completion.create_completion(
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion
async for i, res in result_generator:
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer
raise e
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer
raise item
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer
async for item in iterator:
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate
raise e
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 657, in generate
async for request_output in stream:
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
raise result
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
return fut.result()
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step
request_outputs = await self.engine.step_async()
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async(
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async
output = await make_async(self.driver_worker.execute_model)(
File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_worker.py", line 290, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_model_runner.py", line 332, in execute_model
hidden_states = model_executable(**execute_model_kwargs)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 364, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 291, in forward
hidden_states, residual = layer(
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 229, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/layers/layernorm.py", line 60, in forward
ops.rms_norm(
File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/_custom_ops.py", line 106, in rms_norm
vllm_ops.rms_norm(out, input, weight, epsilon)
NameError: name 'vllm_ops' is not defined
Need to make sure the command not running on the source folder
On Wed, May 8, 2024 at 10:03 AM navpreet-np7 @.***> wrote:
@peterauyeung https://github.com/peterauyeung I am getting the following new error after applying the above fix
INFO 05-08 16:49:17 async_llm_engine.py:526] Received request cmpl-e4c495701dca4e13a678f65d19febf49-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 24661, 13175, 374, 264], lora_request: None. INFO 05-08 16:49:17 async_llm_engine.py:154] Aborted request cmpl-e4c495701dca4e13a678f65d19febf49-0. INFO: 127.0.0.1:54576 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call return await self.app(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion generator = await openai_serving_completion.create_completion( File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion async for i, res in result_generator: File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer raise e File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer raise item File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer async for item in iterator: File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate raise e File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 657, in generate async for request_output in stream: File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 77, in anext raise result File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish task.result() File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop has_requests_in_progress = await asyncio.wait_for( File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for return fut.result() File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step request_outputs = await self.engine.step_async() File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async output = await self.model_executor.execute_model_async( File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async output = await make_async(self.driver_worker.execute_model)( File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, *self.kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_worker.py", line 290, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_model_runner.py", line 332, in execute_model hidden_states = model_executable(execute_model_kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 364, in forward hidden_states = self.model(input_ids, positions, kv_caches, File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 291, in forward hidden_states, residual = layer( File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 229, in forward hidden_states = self.input_layernorm(hidden_states) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, **kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/layers/layernorm.py", line 60, in forward ops.rms_norm( File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/_custom_ops.py", line 106, in rms_norm vllm_ops.rms_norm(out, input, weight, epsilon) NameError: name 'vllm_ops' is not defined
— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/4568#issuecomment-2101023526, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGQWO3YXXTKD6TZUNF5VNLZBJLFLAVCNFSM6AAAAABHEPTFGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRGAZDGNJSGY . You are receiving this because you were mentioned.Message ID: @.***>
OP has been fixed by #5450.
NameError: name 'vllm_ops' is not defined
Fixed by #5009
Your current environment
Collecting environment information... PyTorch version: 2.3.0+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.29.2 Libc version: glibc-2.31
Python version: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-1054-azure-x86_64-with-glibc2.17 Is CUDA available: False CUDA runtime version: 11.3.109 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 57 bits virtual CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Stepping: 6 CPU MHz: 2800.000 CPU max MHz: 2800.0000 CPU min MHz: 800.0000 BogoMIPS: 5586.87 Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 768 KiB L1i cache: 512 KiB L2 cache: 20 MiB L3 cache: 48 MiB NUMA node0 CPU(s): 0-31 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Vulnerable Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm arch_capabilities
Versions of relevant libraries: [pip3] keras2onnx==1.7.0 [pip3] msgpack-numpy==0.4.8 [pip3] numpy==1.23.5 [pip3] onnx==1.13.1 [pip3] onnxconverter-common==1.13.0 [pip3] onnxmltools==1.11.2 [pip3] onnxruntime==1.14.1 [pip3] skl2onnx==1.14.0 [pip3] tf2onnx==1.14.0 [pip3] torch==2.3.0+cpu [pip3] torch-tb-profiler==0.4.1 [pip3] torchaudio==2.0.0 [pip3] torchdata==0.6.0 [pip3] torchtext==0.15.0 [pip3] torchvision==0.14.1 [pip3] triton==2.3.0 [conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] msgpack-numpy 0.4.8 pypi_0 pypi [conda] numpy 1.23.5 pypi_0 pypi [conda] pytorch-mutex 1.0 cpu pytorch [conda] torch 2.3.0+cpu pypi_0 pypi [conda] torch-tb-profiler 0.4.1 pypi_0 pypi [conda] torchaudio 2.0.0 py38_cpu pytorch [conda] torchdata 0.6.0 py38 pytorch [conda] torchtext 0.15.0 py38 pytorch [conda] torchvision 0.14.1 pypi_0 pypi [conda] triton 2.3.0 pypi_0 pypiROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: N/A vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect
🐛 Describe the bug
After building the openapi server from Dockerfile.cpu and running the example code examples/openai_completion_client.py, I am getting the following error
The logs on the openapi server give the following error