[Bug]: RuntimeError: operator torchvision::nms does not exist

murray-z commented 2 months ago

Your current environment

Collecting environment information... INFO 08-28 14:32:56 importing.py:10] Triton not installed; certain GPU-related functions will not be available. WARNING 08-28 14:32:56 _custom_ops.py:17] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") PyTorch version: 2.4.0+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (GCC) 12.2.0 Clang version: Could not collect CMake version: version 3.26.0 Libc version: glibc-2.31

Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.4.0-26-generic-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: 10.1.243 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 52 bits physical, 57 bits virtual CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 143 Model name: Intel(R) Xeon(R) Platinum 8468V Stepping: 8 Frequency boost: enabled CPU MHz: 896.660 CPU max MHz: 2401.0000 CPU min MHz: 800.0000 BogoMIPS: 4800.00 Virtualization: VT-x L1d cache: 4.5 MiB L1i cache: 3 MiB L2 cache: 192 MiB L3 cache: 195 MiB NUMA node0 CPU(s): 0-47,96-143 NUMA node1 CPU(s): 48-95,144-191 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear pconfig flush_l1d arch_capabilities

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] pyzmq==26.2.0 [pip3] torch==2.4.0+cpu [pip3] torchvision==0.19.0 [pip3] transformers==4.44.2 [conda] numpy 1.26.4 pypi_0 pypi [conda] pyzmq 26.2.0 pypi_0 pypi [conda] torch 2.4.0+cpu pypi_0 pypi [conda] torchvision 0.19.0 pypi_0 pypi [conda] transformers 4.44.2 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: N/A vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect

🐛 Describe the bug

vllm serve /home/test/LLM-Models/Llama3.1-8B-Chinese-Chat/

INFO 08-28 14:31:40 importing.py:10] Triton not installed; certain GPU-related functions will not be available. WARNING 08-28 14:31:40 _custom_ops.py:17] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") Traceback (most recent call last): File "/home/test/anaconda3/envs/vllm-cpu/bin/vllm", line 33, in sys.exit(load_entry_point('vllm==0.5.5+cpu', 'console_scripts', 'vllm')()) File "/home/test/anaconda3/envs/vllm-cpu/bin/vllm", line 25, in importlib_load_entry_point return next(matches).load() File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/test/zhangfazhan/vllm/vllm/init.py", line 3, in from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs File "/home/test/zhangfazhan/vllm/vllm/engine/arg_utils.py", line 11, in from vllm.config import (CacheConfig, DecodingConfig, DeviceConfig, File "/home/test/zhangfazhan/vllm/vllm/config.py", line 16, in from vllm.transformers_utils.config import (get_config, File "/home/test/zhangfazhan/vllm/vllm/transformers_utils/config.py", line 6, in from transformers.models.auto.image_processing_auto import ( File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 27, in from ...image_processing_utils import BaseImageProcessor, ImageProcessingMixin File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 21, in from .image_transforms import center_crop, normalize, rescale File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/transformers/image_transforms.py", line 22, in from .image_utils import ( File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/transformers/image_utils.py", line 58, in from torchvision.transforms import InterpolationMode File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torchvision/init.py", line 10, in from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils # usort:skip File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in def meta_nms(dets, scores, iou_threshold): File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/library.py", line 654, in register use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1) File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/library.py", line 154, in _register_fake handle = entry.abstract_impl.register(func_to_register, source) File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 31, in register if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"): RuntimeError: operator torchvision::nms does not exist

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

KaunilD commented 2 months ago

can you please try installing the CPU wheels for torchvision from here: https://download.pytorch.org/whl/torchvision/ my guess is torchvision-0.19.0+cpu-cp310-cp310-linux_x86_64.whl
also could you please share your installation steps?

mgiessing commented 2 months ago

I have a similar issue and reinstalling from the link you mentioned helps but for certain models (like google/gemma-2-2b-it it shows this error:

ERROR 08-28 14:39:29 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3199098 died, exit code: -15
INFO 08-28 14:39:29 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Traceback (most recent call last):
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/entrypoints/openai/rpc/server.py", line 230, in run_rpc_server
    server = AsyncEngineRPCServer(async_engine_args, usage_context, rpc_path)
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/entrypoints/openai/rpc/server.py", line 31, in __init__
    self.engine = AsyncLLMEngine.from_engine_args(
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 740, in from_engine_args
    engine = cls(
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 636, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 840, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 272, in __init__
    super().__init__(*args, **kwargs)
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 270, in __init__
    self.model_executor = executor_class(
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/executor/executor_base.py", line 46, in __init__
    self._init_executor()
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/executor/cpu_executor.py", line 116, in _init_executor
    self._run_workers("load_model")
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/executor/cpu_executor.py", line 183, in _run_workers
    driver_worker_output = self.driver_method_invoker(
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/executor/cpu_executor.py", line 360, in _async_driver_method_invoker
    return driver.execute_method(method, *args, **kwargs).get()
  File "/home/mgiessing/micromamba/envs/vllm-055/lib/python3.10/site-packages/vllm-0.5.5+cpu-py3.10-linux-x86_64.egg/vllm/executor/multiproc_worker_utils.py", line 58, in get
    raise self.result.exception
ValueError: Torch SPDA does not support logits soft cap.

I seem to get this error only on CPU systems (intel and power)

KaunilD commented 2 months ago

SPDA (scaled product dot attention) ? It would be easier to solve if it didnt work on cuda backends as well since SDPA is not suported for some models.

@mgiessing what kind of model are you using ? and can you please also share in your args?

mgiessing commented 2 months ago

Absolutely:

I ran my test like this:

export MODEL="google/gemma-2-2b-it"
python3 -m vllm.entrypoints.openai.api_server --dtype=bfloat16 --model=${MODEL}

As stated before for other models such asTinyLlama/TinyLlama-1.1B-Chat-v1.0 I don't see any problem and it works without issues on Intel & Power CPU.

I installed vllm following this link https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html#build-from-source for Intel and slightly different for Power (using ppc64le optimized pytorch).

KaunilD commented 2 months ago

@mgiessing thaank you! While i look into the installation thread. i think the model = gemma-2 is a clue. can you please try relaunching with --enforce-eager option set?

mgiessing commented 2 months ago

Yeah - seems it's related to the gemma family, I think I also saw somewhere the logits topic mentioned...

Unfortunately setting --enforce-eager yield the same error

murray-z commented 2 months ago

can you please try installing the CPU wheels for torchvision from here: https://download.pytorch.org/whl/torchvision/ my guess is torchvision-0.19.0+cpu-cp310-cp310-linux_x86_64.whl

also could you please share your installation steps?

after reinstall torchvision-0.19.0+cpu-cp310-cp310-linux_x86_64.whl, i have another error: INFO 08-29 09:26:38 logger.py:36] Received request chat-6d0f95d191424b55b7442ed3ec726a77: prompt: '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=131060, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 882, 128007, 271, 9906, 0, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None. INFO 08-29 09:26:38 async_llm_engine.py:205] Added request chat-6d0f95d191424b55b7442ed3ec726a77. (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 _custom_ops.py:36] Error in calling custom op rms_norm: '_OpNamespace' '_C' object has no attribute 'rms_norm' (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 _custom_ops.py:36] Possibly you have built or installed an obsolete version of vllm. (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 _custom_ops.py:36] Please try a clean build and install of vllm,or remove old built files such as vllm/cpython.so and build/ . (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method execute_model: '_OpNamespace' '_C' object has no attribute 'rms_norm', Traceback (most recent call last): (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] output = executor(*args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/worker/worker_base.py", line 328, in execute_model (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] output = self.model_runner.execute_model( (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return func(args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/worker/cpu_model_runner.py", line 373, in execute_model (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] hidden_states = model_executable(execute_model_kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return self._call_impl(args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return forward_call(*args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/model_executor/models/llama.py", line 429, in forward (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches, (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return self._call_impl(*args, *kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return forward_call(args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/model_executor/models/llama.py", line 329, in forward (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] hidden_states, residual = layer( (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return self._call_impl(*args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return forward_call(*args, *kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/model_executor/models/llama.py", line 247, in forward (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] hidden_states = self.input_layernorm(hidden_states) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return self._call_impl(args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return forward_call(*args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/model_executor/custom_op.py", line 14, in forward (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return self._forward_method(*args, *kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/model_executor/custom_op.py", line 39, in forward_cpu (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return self.forward_cuda(args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/model_executor/layers/layernorm.py", line 62, in forward_cuda (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] ops.rms_norm( (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/_custom_ops.py", line 37, in wrapper (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] raise e (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/_custom_ops.py", line 28, in wrapper (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] return fn(*args, kwargs) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm/vllm/_custom_ops.py", line 155, in rms_norm (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] torch.ops._C.rms_norm(out, input, weight, epsilon) (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/torch/_ops.py", line 1170, in getattr (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] raise AttributeError( (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' (VllmWorkerProcess pid=3860680) ERROR 08-29 09:26:38 multiproc_worker_utils.py:226] ERROR 08-29 09:26:38 async_llm_engine.py:62] Engine background task failed ERROR 08-29 09:26:38 async_llm_engine.py:62] Traceback (most recent call last): ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 52, in _log_task_completion ERROR 08-29 09:26:38 async_llm_engine.py:62] return_value = task.result() ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 899, in run_engine_loop ERROR 08-29 09:26:38 async_llm_engine.py:62] result = task.result() ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 842, in engine_step ERROR 08-29 09:26:38 async_llm_engine.py:62] request_outputs = await self.engine.step_async(virtual_engine) ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 339, in step_async ERROR 08-29 09:26:38 async_llm_engine.py:62] output = await self.model_executor.execute_model_async( ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/executor/cpu_executor.py", line 304, in execute_model_async ERROR 08-29 09:26:38 async_llm_engine.py:62] output = await make_async(self.execute_model ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/concurrent/futures/thread.py", line 58, in run ERROR 08-29 09:26:38 async_llm_engine.py:62] result = self.fn(*self.args, *self.kwargs) ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/executor/cpu_executor.py", line 222, in execute_model ERROR 08-29 09:26:38 async_llm_engine.py:62] output = self.driver_method_invoker(self.driver_worker, ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/executor/cpu_executor.py", line 360, in _async_driver_method_invoker ERROR 08-29 09:26:38 async_llm_engine.py:62] return driver.execute_method(method, args, kwargs).get() ERROR 08-29 09:26:38 async_llm_engine.py:62] File "/home/test/zhangfazhan/vllm/vllm/executor/multiproc_worker_utils.py", line 58, in get ERROR 08-29 09:26:38 async_llm_engine.py:62] raise self.result.exception ERROR 08-29 09:26:38 async_llm_engine.py:62] AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' Exception in callback functools.partial(<function _log_task_completion at 0x7f8a51db2e60>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f8a504066e0>>) handle: <Handle functools.partial(<function _log_task_completion at 0x7f8a51db2e60>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f8a504066e0>>)> Traceback (most recent call last): File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 52, in _log_task_completion return_value = task.result() File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 899, in run_engine_loop result = task.result() File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 842, in engine_step request_outputs = await self.engine.step_async(virtual_engine) File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 339, in step_async output = await self.model_executor.execute_model_async( File "/home/test/zhangfazhan/vllm/vllm/executor/cpu_executor.py", line 304, in execute_model_async output = await make_async(self.execute_model File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "/home/test/zhangfazhan/vllm/vllm/executor/cpu_executor.py", line 222, in execute_model output = self.driver_method_invoker(self.driver_worker, File "/home/test/zhangfazhan/vllm/vllm/executor/cpu_executor.py", line 360, in _async_driver_method_invoker return driver.execute_method(method, args, **kwargs).get() File "/home/test/zhangfazhan/vllm/vllm/executor/multiproc_worker_utils.py", line 58, in get raise self.result.exception AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "/home/test/zhangfazhan/vllm/vllm/engine/async_llm_engine.py", line 64, in _log_task_completion raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause. ERROR 08-29 09:26:38 client.py:265] Got Unhealthy response from RPC Server ERROR 08-29 09:26:38 client.py:412] AsyncEngineDeadError('Background loop is stopped.') ERROR 08-29 09:26:38 client.py:412] Traceback (most recent call last): ERROR 08-29 09:26:38 client.py:412] File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/rpc/client.py", line 409, in generate ERROR 08-29 09:26:38 client.py:412] await self.check_health(socket=socket) ERROR 08-29 09:26:38 client.py:412] File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health ERROR 08-29 09:26:38 client.py:412] await self._send_one_way_rpc_request( ERROR 08-29 09:26:38 client.py:412] File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request ERROR 08-29 09:26:38 client.py:412] raise response ERROR 08-29 09:26:38 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped. INFO: 172.16.3.103:54972 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application

Exception Group Traceback (most recent call last): | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/_utils.py", line 83, in collapse_excgroups | yield | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/base.py", line 190, in call | async with anyio.create_task_group() as task_group: | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit | raise BaseExceptionGroup( | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi | result = await app( # type: ignore[func-returns-value] | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call | return await self.app(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/applications.py", line 123, in call | await self.middleware_stack(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/base.py", line 189, in call | with collapse_excgroups(): | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/contextlib.py", line 153, in exit | self.gen.throw(typ, value, traceback) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/_utils.py", line 89, in collapse_excgroups | raise exc | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/base.py", line 191, in call | response = await self.dispatch_func(request, call_next) | File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/api_server.py", line 361, in authentication | return await call_next(request) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/base.py", line 165, in call_next | raise app_exc | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/base.py", line 151, in coro | await self.app(scope, receive_or_disconnect, send_no_error) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call | await self.app(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/routing.py", line 754, in call | await self.middleware_stack(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/routing.py", line 774, in app | await route.handle(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle | await self.app(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/routing.py", line 77, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/starlette/routing.py", line 74, in app | response = await f(request) | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/fastapi/routing.py", line 297, in app | raw_response = await run_endpoint_function( | File "/home/test/anaconda3/envs/vllm-cpu/lib/python3.10/site-packages/fastapi/routing.py", line 210, in run_endpoint_function | return await dependant.call(**values) | File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/api_server.py", line 271, in create_chat_completion | generator = await openai_serving_chat.create_chat_completion( | File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/serving_chat.py", line 198, in create_chat_completion | return await self.chat_completion_full_generator( | File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/serving_chat.py", line 448, in chat_completion_full_generator | async for res in result_generator: | File "/home/test/zhangfazhan/vllm/vllm/utils.py", line 431, in iterate_with_cancellation | item = await awaits[0] | File "/home/test/zhangfazhan/vllm/vllm/entrypoints/openai/rpc/client.py", line 416, in generate | raise request_output | AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' +------------------------------------

mgiessing commented 2 months ago

@murray-z can you try a clean build (as it is also indicated in the error messages?

E.g. like this:

1.) Create a fresh/new environment (conda/mamba/venv) - it seems you're using anaconda so try this:

conda create -y -n vllm-055 -c defaults python=3.10
conda activate vllm-055

2.) Clone vllm 0.5.5

git clone -b v0.5.5 https://github.com/vllm-project/vllm vllm-055
cd vllm-055

pip install --upgrade pip
pip install wheel packaging ninja "setuptools>=49.4.0" numpy
pip install -v -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu

VLLM_TARGET_DEVICE=cpu python setup.py install

pip install --force-reinstall torchvision --extra-index-url https://download.pytorch.org/whl/torchvision/

3.) Run you sample infer again

murray-z commented 2 months ago

@murray-z can you try a clean build (as it is also indicated in the error messages?

E.g. like this:

1.) Create a fresh/new environment (conda/mamba/venv) - it seems you're using anaconda so try this:
conda create -y -n vllm-055 -c defaults python=3.10
conda activate vllm-055
2.) Clone vllm 0.5.5
git clone -b v0.5.5 https://github.com/vllm-project/vllm vllm-055
cd vllm-055

pip install --upgrade pip
pip install wheel packaging ninja "setuptools>=49.4.0" numpy
pip install -v -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu

VLLM_TARGET_DEVICE=cpu python setup.py install

pip install --force-reinstall torchvision --extra-index-url https://download.pytorch.org/whl/torchvision/
3.) Run you sample infer again

follow these step, i have this error: INFO: Started server process [1117335] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8003 (Press CTRL+C to quit) INFO 08-29 17:01:05 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 08-29 17:01:09 logger.py:36] Received request chat-a8ad450f9e85463991309f2dc1c3f224: prompt: '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=131060, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 882, 128007, 271, 9906, 0, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None. INFO 08-29 17:01:09 async_llm_engine.py:208] Added request chat-a8ad450f9e85463991309f2dc1c3f224. (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 _custom_ops.py:36] Error in calling custom op rms_norm: '_OpNamespace' '_C' object has no attribute 'rms_norm' (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 _custom_ops.py:36] Possibly you have built or installed an obsolete version of vllm. (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 _custom_ops.py:36] Please try a clean build and install of vllm,or remove old built files such as vllm/cpython.so and build/ . (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method execute_model: '_OpNamespace' '_C' object has no attribute 'rms_norm', Traceback (most recent call last): (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] output = executor(*args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/worker/worker_base.py", line 322, in execute_model (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] output = self.model_runner.execute_model( (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return func(args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/worker/cpu_model_runner.py", line 373, in execute_model (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] hidden_states = model_executable(execute_model_kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return self._call_impl(args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return forward_call(*args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/model_executor/models/llama.py", line 429, in forward (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches, (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return self._call_impl(*args, *kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return forward_call(args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/model_executor/models/llama.py", line 329, in forward (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] hidden_states, residual = layer( (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return self._call_impl(*args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return forward_call(*args, *kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/model_executor/models/llama.py", line 247, in forward (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] hidden_states = self.input_layernorm(hidden_states) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return self._call_impl(args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return forward_call(*args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/model_executor/custom_op.py", line 14, in forward (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return self._forward_method(*args, *kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/model_executor/custom_op.py", line 39, in forward_cpu (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return self.forward_cuda(args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/model_executor/layers/layernorm.py", line 62, in forward_cuda (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] ops.rms_norm( (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/_custom_ops.py", line 37, in wrapper (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] raise e (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/_custom_ops.py", line 28, in wrapper (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] return fn(*args, kwargs) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/zhangfazhan/vllm-055/vllm/_custom_ops.py", line 155, in rms_norm (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] torch.ops._C.rms_norm(out, input, weight, epsilon) (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/torch/_ops.py", line 1170, in getattr (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] raise AttributeError( (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' (VllmWorkerProcess pid=1117794) ERROR 08-29 17:01:09 multiproc_worker_utils.py:226] ERROR 08-29 17:01:09 async_llm_engine.py:65] Engine background task failed ERROR 08-29 17:01:09 async_llm_engine.py:65] Traceback (most recent call last): ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion ERROR 08-29 17:01:09 async_llm_engine.py:65] return_value = task.result() ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop ERROR 08-29 17:01:09 async_llm_engine.py:65] result = task.result() ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 873, in engine_step ERROR 08-29 17:01:09 async_llm_engine.py:65] request_outputs = await self.engine.step_async(virtual_engine) ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 337, in step_async ERROR 08-29 17:01:09 async_llm_engine.py:65] output = await self.model_executor.execute_model_async( ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 304, in execute_model_async ERROR 08-29 17:01:09 async_llm_engine.py:65] output = await make_async(self.execute_model ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/concurrent/futures/thread.py", line 58, in run ERROR 08-29 17:01:09 async_llm_engine.py:65] result = self.fn(*self.args, *self.kwargs) ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 222, in execute_model ERROR 08-29 17:01:09 async_llm_engine.py:65] output = self.driver_method_invoker(self.driver_worker, ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 360, in _async_driver_method_invoker ERROR 08-29 17:01:09 async_llm_engine.py:65] return driver.execute_method(method, args, kwargs).get() ERROR 08-29 17:01:09 async_llm_engine.py:65] File "/home/test/zhangfazhan/vllm-055/vllm/executor/multiproc_worker_utils.py", line 58, in get ERROR 08-29 17:01:09 async_llm_engine.py:65] raise self.result.exception ERROR 08-29 17:01:09 async_llm_engine.py:65] AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' Exception in callback functools.partial(<function _log_task_completion at 0x7f26736a25f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f265d224c10>>) handle: <Handle functools.partial(<function _log_task_completion at 0x7f26736a25f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f265d224c10>>)> Traceback (most recent call last): File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion return_value = task.result() File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop result = task.result() File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 873, in engine_step request_outputs = await self.engine.step_async(virtual_engine) File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 337, in step_async output = await self.model_executor.execute_model_async( File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 304, in execute_model_async output = await make_async(self.execute_model File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 222, in execute_model output = self.driver_method_invoker(self.driver_worker, File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 360, in _async_driver_method_invoker return driver.execute_method(method, args, **kwargs).get() File "/home/test/zhangfazhan/vllm-055/vllm/executor/multiproc_worker_utils.py", line 58, in get raise self.result.exception AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 67, in _log_task_completion raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause. ERROR 08-29 17:01:09 client.py:265] Got Unhealthy response from RPC Server ERROR 08-29 17:01:09 client.py:412] AsyncEngineDeadError('Background loop is stopped.') ERROR 08-29 17:01:09 client.py:412] Traceback (most recent call last): ERROR 08-29 17:01:09 client.py:412] File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/rpc/client.py", line 409, in generate ERROR 08-29 17:01:09 client.py:412] await self.check_health(socket=socket) ERROR 08-29 17:01:09 client.py:412] File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/rpc/client.py", line 431, in check_health ERROR 08-29 17:01:09 client.py:412] await self._send_one_way_rpc_request( ERROR 08-29 17:01:09 client.py:412] File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/rpc/client.py", line 266, in _send_one_way_rpc_request ERROR 08-29 17:01:09 client.py:412] raise response ERROR 08-29 17:01:09 client.py:412] vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop is stopped. INFO: 172.16.3.103:58866 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/routing.py", line 754, in call await self.middleware_stack(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/fastapi/routing.py", line 297, in app raw_response = await run_endpoint_function( File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/site-packages/fastapi/routing.py", line 210, in run_endpoint_function return await dependant.call(values) File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/api_server.py", line 271, in create_chat_completion generator = await openai_serving_chat.create_chat_completion( File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/serving_chat.py", line 188, in create_chat_completion return await self.chat_completion_full_generator( File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/serving_chat.py", line 438, in chat_completion_full_generator async for res in result_generator: File "/home/test/zhangfazhan/vllm-055/vllm/utils.py", line 430, in iterate_with_cancellation item = await awaits[0] File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/rpc/client.py", line 416, in generate raise request_output AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' INFO 08-29 17:01:10 logger.py:36] Received request chat-3759086c1dd348ff8ce3e78d81bf95f4: prompt: '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=131060, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 882, 128007, 271, 9906, 0, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None. CRITICAL 08-29 17:01:10 launcher.py:98] AsyncLLMEngine is already dead, terminating server process INFO: 172.16.3.103:58868 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error INFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. INFO: Finished server process [1117335] INFO 08-29 17:01:10 server.py:222] vLLM ZMQ RPC Server was interrupted. Future exception was never retrieved future: <Future finished exception=AttributeError("'_OpNamespace' '_C' object has no attribute 'rms_norm'")> Traceback (most recent call last): File "/home/test/zhangfazhan/vllm-055/vllm/entrypoints/openai/rpc/server.py", line 111, in generate async for request_output in results_generator: File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 1064, in generate async for output in await self.add_request( File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 113, in generator raise result File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 55, in _log_task_completion return_value = task.result() File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 930, in run_engine_loop result = task.result() File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 873, in engine_step request_outputs = await self.engine.step_async(virtual_engine) File "/home/test/zhangfazhan/vllm-055/vllm/engine/async_llm_engine.py", line 337, in step_async output = await self.model_executor.execute_model_async( File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 304, in execute_model_async output = await make_async(self.execute_model File "/home/test/anaconda3/envs/vllm-055/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 222, in execute_model output = self.driver_method_invoker(self.driver_worker, File "/home/test/zhangfazhan/vllm-055/vllm/executor/cpu_executor.py", line 360, in _async_driver_method_invoker return driver.execute_method(method, args, kwargs).get() File "/home/test/zhangfazhan/vllm-055/vllm/executor/multiproc_worker_utils.py", line 58, in get raise self.result.exception AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm' INFO 08-29 17:01:12 multiproc_worker_utils.py:123] Killing local vLLM worker processes

mgiessing commented 2 months ago

That's interesting - can you tell the details of your system? OS, CPU etc.?

Other than that maybe @KaunilD has an idea what's going wrong on your side :/

murray-z commented 2 months ago

That's interesting - can you tell the details of your system? OS, CPU etc.?

Other than that maybe @KaunilD has an idea what's going wrong on your side :/

Linux master 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 52 bits physical, 57 bits virtual CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 143 Model name: Intel(R) Xeon(R) Platinum 8468V Stepping: 8 Frequency boost: enabled CPU MHz: 1000.004 CPU max MHz: 2401.0000 CPU min MHz: 800.0000 BogoMIPS: 4800.00 Virtualization: VT-x L1d cache: 4.5 MiB L1i cache: 3 MiB L2 cache: 192 MiB L3 cache: 195 MiB NUMA node0 CPU(s): 0-47,96-143 NUMA node1 CPU(s): 48-95,144-191 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pb e syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfm perf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x 2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l 2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsg sbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clw b intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mb m_local avx512_bf16 wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulq dq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear pconfig flush_l1d arch_cap abilities

mgiessing commented 2 months ago

Looks similar to my environment - only difference is I'm on RHEL.

Not really a solution to your issue but did you try to build and run a container via Dockerfile.cpu - that way at least the environment should be similar?

vllm-project / vllm