vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.18k stars 4.74k forks source link

[Bug]: Query to the openapi server with cpu backend is throwing error #4568

Closed navpreet-np7 closed 6 months ago

navpreet-np7 commented 7 months ago

Your current environment

The output of `python collect_env.py`

Collecting environment information... PyTorch version: 2.3.0+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.29.2 Libc version: glibc-2.31

Python version: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-1054-azure-x86_64-with-glibc2.17 Is CUDA available: False CUDA runtime version: 11.3.109 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 57 bits virtual CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Stepping: 6 CPU MHz: 2800.000 CPU max MHz: 2800.0000 CPU min MHz: 800.0000 BogoMIPS: 5586.87 Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 768 KiB L1i cache: 512 KiB L2 cache: 20 MiB L3 cache: 48 MiB NUMA node0 CPU(s): 0-31 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Vulnerable Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm arch_capabilities

Versions of relevant libraries: [pip3] keras2onnx==1.7.0 [pip3] msgpack-numpy==0.4.8 [pip3] numpy==1.23.5 [pip3] onnx==1.13.1 [pip3] onnxconverter-common==1.13.0 [pip3] onnxmltools==1.11.2 [pip3] onnxruntime==1.14.1 [pip3] skl2onnx==1.14.0 [pip3] tf2onnx==1.14.0 [pip3] torch==2.3.0+cpu [pip3] torch-tb-profiler==0.4.1 [pip3] torchaudio==2.0.0 [pip3] torchdata==0.6.0 [pip3] torchtext==0.15.0 [pip3] torchvision==0.14.1 [pip3] triton==2.3.0 [conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] msgpack-numpy 0.4.8 pypi_0 pypi [conda] numpy 1.23.5 pypi_0 pypi [conda] pytorch-mutex 1.0 cpu pytorch [conda] torch 2.3.0+cpu pypi_0 pypi [conda] torch-tb-profiler 0.4.1 pypi_0 pypi [conda] torchaudio 2.0.0 py38_cpu pytorch [conda] torchdata 0.6.0 py38 pytorch [conda] torchtext 0.15.0 py38 pytorch [conda] torchvision 0.14.1 pypi_0 pypi [conda] triton 2.3.0 pypi_0 pypiROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: N/A vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect

🐛 Describe the bug

After building the openapi server from Dockerfile.cpu and running the example code examples/openai_completion_client.py, I am getting the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/resources/completions.py", line 517, in create
    return self._post(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 1240, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 921, in request
    return self._request(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 1005, in _request
    return self._retry_request(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 1053, in _retry_request
    return self._request(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 1005, in _request
    return self._retry_request(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 1053, in _retry_request
    return self._request(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/openai/_base_client.py", line 1020, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error

The logs on the openapi server give the following error

ERROR 05-02 21:11:03 async_llm_engine.py:43] Engine background task failed
ERROR 05-02 21:11:03 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
ERROR 05-02 21:11:03 async_llm_engine.py:43]     task.result()
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop
ERROR 05-02 21:11:03 async_llm_engine.py:43]     has_requests_in_progress = await asyncio.wait_for(
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
ERROR 05-02 21:11:03 async_llm_engine.py:43]     return fut.result()
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step
ERROR 05-02 21:11:03 async_llm_engine.py:43]     request_outputs = await self.engine.step_async()
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
ERROR 05-02 21:11:03 async_llm_engine.py:43]     output = await self.model_executor.execute_model_async(
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async
ERROR 05-02 21:11:03 async_llm_engine.py:43]     output = await make_async(self.driver_worker.execute_model)(
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run
ERROR 05-02 21:11:03 async_llm_engine.py:43]     result = self.fn(*self.args, **self.kwargs)
ERROR 05-02 21:11:03 async_llm_engine.py:43]   File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-02 21:11:03 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-02 21:11:03 async_llm_engine.py:43] TypeError: execute_model() got an unexpected keyword argument 'num_lookahead_slots'
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f20f9055dc0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f214a4bef40>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f20f9055dc0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f214a4bef40>>)>
Traceback (most recent call last):
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step
    request_outputs = await self.engine.step_async()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async
    output = await make_async(self.driver_worker.execute_model)(
  File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
TypeError: execute_model() got an unexpected keyword argument 'num_lookahead_slots'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 05-02 21:11:03 async_llm_engine.py:154] Aborted request cmpl-93b99ba36ec147be9185742793807457-0.
INFO:     127.0.0.1:50494 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion
    generator = await openai_serving_completion.create_completion(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion
    async for i, res in result_generator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer
    raise item
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer
    async for item in iterator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 657, in generate
    async for request_output in stream:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step
    request_outputs = await self.engine.step_async()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async
    output = await make_async(self.driver_worker.execute_model)(
  File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
TypeError: execute_model() got an unexpected keyword argument 'num_lookahead_slots'
INFO 05-02 21:11:04 async_llm_engine.py:526] Received request cmpl-b69f62eca3fd4aafa510370b21309fb7-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [2, 16033, 2659, 16, 10], lora_request: None.
INFO 05-02 21:11:04 async_llm_engine.py:154] Aborted request cmpl-b69f62eca3fd4aafa510370b21309fb7-0.
INFO:     127.0.0.1:50510 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion
    generator = await openai_serving_completion.create_completion(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion
    async for i, res in result_generator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer
    raise item
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer
    async for item in iterator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 657, in generate
    async for request_output in stream:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step
    request_outputs = await self.engine.step_async()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async
    output = await make_async(self.driver_worker.execute_model)(
  File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
TypeError: execute_model() got an unexpected keyword argument 'num_lookahead_slots'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion
    generator = await openai_serving_completion.create_completion(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion
    async for i, res in result_generator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer
    raise item
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer
    async for item in iterator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 647, in generate
    stream = await self.add_request(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 534, in add_request
    self.start_background_loop()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 408, in start_background_loop
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
hiGiraffe commented 7 months ago

I also encountered the same error.

adelineclarissa commented 7 months ago

Me too

peterauyeung commented 7 months ago

Maybe it overlaps with num_speculative_tokens in the dictionary?

peterauyeung commented 7 months ago

I added num_lookahead_slots under https://github.com/vllm-project/vllm/blob/f8e7adda21810104382bdf3febe3ea02c72f7348/vllm/worker/cpu_worker.py#L257

now I am hitting this bug: https://github.com/vllm-project/vllm/issues/4229

peterauyeung commented 7 months ago

I am able to run my model on CPU now with the fix

https://github.com/vllm-project/vllm/pull/4590

navpreet-np7 commented 6 months ago

@peterauyeung I am getting the following new error after applying the above fix

INFO 05-08 16:49:17 async_llm_engine.py:526] Received request cmpl-e4c495701dca4e13a678f65d19febf49-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 24661, 13175, 374, 264], lora_request: None.
INFO 05-08 16:49:17 async_llm_engine.py:154] Aborted request cmpl-e4c495701dca4e13a678f65d19febf49-0.
INFO:     127.0.0.1:54576 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion
    generator = await openai_serving_completion.create_completion(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion
    async for i, res in result_generator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer
    raise item
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer
    async for item in iterator:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate
    raise e
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 657, in generate
    async for request_output in stream:
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step
    request_outputs = await self.engine.step_async()
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async
    output = await make_async(self.driver_worker.execute_model)(
  File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_worker.py", line 290, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_model_runner.py", line 332, in execute_model
    hidden_states = model_executable(**execute_model_kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 364, in forward
    hidden_states = self.model(input_ids, positions, kv_caches,
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 291, in forward
    hidden_states, residual = layer(
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 229, in forward
    hidden_states = self.input_layernorm(hidden_states)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/layers/layernorm.py", line 60, in forward
    ops.rms_norm(
  File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/_custom_ops.py", line 106, in rms_norm
    vllm_ops.rms_norm(out, input, weight, epsilon)
NameError: name 'vllm_ops' is not defined
peterauyeung commented 6 months ago

Need to make sure the command not running on the source folder

On Wed, May 8, 2024 at 10:03 AM navpreet-np7 @.***> wrote:

@peterauyeung https://github.com/peterauyeung I am getting the following new error after applying the above fix

INFO 05-08 16:49:17 async_llm_engine.py:526] Received request cmpl-e4c495701dca4e13a678f65d19febf49-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 24661, 13175, 374, 264], lora_request: None. INFO 05-08 16:49:17 async_llm_engine.py:154] Aborted request cmpl-e4c495701dca4e13a678f65d19febf49-0. INFO: 127.0.0.1:54576 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/anaconda/envs/py38_default/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call return await self.app(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/anaconda/envs/py38_default/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/api_server.py", line 105, in create_completion generator = await openai_serving_completion.create_completion( File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion async for i, res in result_generator: File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 239, in consumer raise e File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 232, in consumer raise item File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/utils.py", line 216, in producer async for item in iterator: File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 663, in generate raise e File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 657, in generate async for request_output in stream: File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 77, in anext raise result File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish task.result() File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 498, in run_engine_loop has_requests_in_progress = await asyncio.wait_for( File "/anaconda/envs/py38_default/lib/python3.8/asyncio/tasks.py", line 494, in wait_for return fut.result() File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 472, in engine_step request_outputs = await self.engine.step_async() File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async output = await self.model_executor.execute_model_async( File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/executor/cpu_executor.py", line 114, in execute_model_async output = await make_async(self.driver_worker.execute_model)( File "/anaconda/envs/py38_default/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, *self.kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_worker.py", line 290, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/worker/cpu_model_runner.py", line 332, in execute_model hidden_states = model_executable(execute_model_kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 364, in forward hidden_states = self.model(input_ids, positions, kv_caches, File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 291, in forward hidden_states, residual = layer( File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/models/llama.py", line 229, in forward hidden_states = self.input_layernorm(hidden_states) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, **kwargs) File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/model_executor/layers/layernorm.py", line 60, in forward ops.rms_norm( File "/home/navpreetsingh/Desktop/LLM/vllm/vllm/_custom_ops.py", line 106, in rms_norm vllm_ops.rms_norm(out, input, weight, epsilon) NameError: name 'vllm_ops' is not defined

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/4568#issuecomment-2101023526, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGQWO3YXXTKD6TZUNF5VNLZBJLFLAVCNFSM6AAAAABHEPTFGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRGAZDGNJSGY . You are receiving this because you were mentioned.Message ID: @.***>

-- Work like you don't need the money Love as though you have never been hurt Dance as though no one is watching you Sing as though no one can hear you

DarkLight1337 commented 6 months ago

OP has been fixed by #5450.

NameError: name 'vllm_ops' is not defined

Fixed by #5009