vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.41k stars 4.6k forks source link

[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2 #6478

Open choco9966 opened 4 months ago

choco9966 commented 4 months ago

Your current environment

Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.3.1
[pip3] torchvision==0.18.1
[pip3] transformers==4.42.4
[pip3] triton==2.3.1
[conda] flashinfer                0.0.9+cu121torch2.3          pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] sentence-transformers     3.0.1                    pypi_0    pypi
[conda] torch                     2.3.1                    pypi_0    pypi
[conda] torchvision               0.18.1                   pypi_0    pypi
[conda] transformers              4.42.4                   pypi_0    pypi
[conda] triton                    2.3.1                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.2

🐛 Describe the bug

I encountered the following error when running Gemma-2-9b. Even after deleting and reinstalling the virtual environment, the same error repeats.

INFO 07-17 00:14:06 selector.py:79] Using Flashinfer backend.
INFO 07-17 00:14:07 selector.py:79] Using Flashinfer backend.
INFO 07-17 00:14:10 model_runner.py:266] Loading model weights took 17.3781 GB
ERROR 07-17 00:14:10 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-17 00:14:10 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-17 00:14:10 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]:     llm = LLM(model=args.model_path, 
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 336, in forward
[rank0]:     hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 277, in forward
[rank0]:     hidden_states, residual = layer(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 221, in forward
[rank0]:     hidden_states = self.self_attn(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 161, in forward
[rank0]:     q, k = self.rotary_emb(positions, q, k)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]:     return self._forward_method(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]:     ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]:     raise e
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]:     torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
dsingal0 commented 4 months ago

Running into the same issue with vllm 0.5.2, torch 2.3.1 and flashinfer https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.9/flashinfer-0.0.9+cu121torch2.3-cp311-cp311-linux_x86_64.whl

pavanjava commented 4 months ago

Ran into the same issue with T4 GPU and vllm==0.5.2, model==google/gemma-2b. Infact this is not just with gemma, but I see this with every supported model of vLLM now. vllm_error

tlrmchlsmth commented 4 months ago

What OSes are you all on?

tlrmchlsmth commented 4 months ago

Also @choco9966 is there more output that you could share? Ideally copy and paste everything

thegallier commented 4 months ago

Linux

qgallouedec commented 4 months ago

Hey, having the same issue:

from vllm import LLM
LLM("vwxyzjn/rloo_tldr")
WARNING 07-18 15:12:42 _custom_ops.py:14] Failed to import from vllm._C with ImportError("/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_C.abi3.so)")
INFO 07-18 15:13:05 llm_engine.py:174] Initializing an LLM engine (v0.5.2) with config: model='vwxyzjn/rloo_tldr', speculative_config=None, tokenizer='vwxyzjn/rloo_tldr', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=vwxyzjn/rloo_tldr, use_v2_block_manager=False, enable_prefix_caching=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 07-18 15:13:05 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-18 15:13:05 weight_utils.py:261] No model.safetensors.index.json found in remote.
INFO 07-18 15:13:06 model_runner.py:266] Loading model weights took 1.8848 GB
ERROR 07-18 15:13:06 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-18 15:13:06 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-18 15:13:06 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]:   File "<stdin>", line 1, in <module>
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
[rank0]:     hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
[rank0]:     hidden_states = layer(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
[rank0]:     attn_output = self.attention(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
[rank0]:     q, k = self.rotary_emb(position_ids, q, k)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]:     return self._forward_method(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]:     ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]:     raise e
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]:     torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'

System info

qgallouedec commented 4 months ago

Downgrading to 0.5.1 solved the issue

pip install vllm==0.5.1
thegallier commented 4 months ago

Same other versions? I think I tried that version but maybe other versions.

tlrmchlsmth commented 4 months ago

@qgallouedec that's the same problem as https://github.com/vllm-project/vllm/issues/6462. I think people are generally having glibc versioning issues with 0.5.2.

Working on it here: https://github.com/vllm-project/vllm/pull/6517

RylanSchaeffer commented 4 months ago

I have this same problem too.

Traceback (most recent call last):
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
    self.execute_model(model_input, kv_caches, intermediate_tensors)
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
    hidden_or_intermediate_states = model_executable(
                                    ^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
    hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
    hidden_states = layer(
                    ^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
    attn_output = self.attention(
                  ^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
    q, k = self.rotary_emb(position_ids, q, k)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
    return self._forward_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
    ops.rotary_embedding(positions, query, key, self.head_size,
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 43, in wrapper
    raise e
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 34, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
    torch.ops._C.rotary_embedding(positions, query, key, head_size,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/_ops.py", line 921, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
thegallier commented 4 months ago

I moved the vllm (inside vllm) subdirectory. Then installed 0.4.2 and that solved my issues. Rms and rotary.

yuchenlin commented 4 months ago

having the same issue with 0.5.2... do we plan to fix it? probably due to the requirements of torch 2.3.1

tlrmchlsmth commented 3 months ago

Assuming that most people are having glibc versioning problems here, this issue should be fixed for most people in 0.5.3 and later, now that we are building on Ubuntu 20.04. I think we can go ahead and close this one.

mahenning commented 2 months ago

For me this issue persists on Ubuntu 22.04 with vllm 0.6.0 (also with v0.4.2).