Open choco9966 opened 4 months ago
Running into the same issue with vllm 0.5.2, torch 2.3.1 and flashinfer https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.9/flashinfer-0.0.9+cu121torch2.3-cp311-cp311-linux_x86_64.whl
Ran into the same issue with T4 GPU and vllm==0.5.2, model==google/gemma-2b. Infact this is not just with gemma, but I see this with every supported model of vLLM now.
What OSes are you all on?
Also @choco9966 is there more output that you could share? Ideally copy and paste everything
Linux
Hey, having the same issue:
from vllm import LLM
LLM("vwxyzjn/rloo_tldr")
WARNING 07-18 15:12:42 _custom_ops.py:14] Failed to import from vllm._C with ImportError("/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_C.abi3.so)")
INFO 07-18 15:13:05 llm_engine.py:174] Initializing an LLM engine (v0.5.2) with config: model='vwxyzjn/rloo_tldr', speculative_config=None, tokenizer='vwxyzjn/rloo_tldr', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=vwxyzjn/rloo_tldr, use_v2_block_manager=False, enable_prefix_caching=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 07-18 15:13:05 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-18 15:13:05 weight_utils.py:261] No model.safetensors.index.json found in remote.
INFO 07-18 15:13:06 model_runner.py:266] Loading model weights took 1.8848 GB
ERROR 07-18 15:13:06 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-18 15:13:06 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-18 15:13:06 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]: File "<stdin>", line 1, in <module>
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
[rank0]: hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
[rank0]: hidden_states = layer(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
[rank0]: attn_output = self.attention(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
[rank0]: q, k = self.rotary_emb(position_ids, q, k)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]: return self._forward_method(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]: ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]: raise e
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]: torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]: raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
Downgrading to 0.5.1 solved the issue
pip install vllm==0.5.1
Same other versions? I think I tried that version but maybe other versions.
@qgallouedec that's the same problem as https://github.com/vllm-project/vllm/issues/6462. I think people are generally having glibc versioning issues with 0.5.2.
Working on it here: https://github.com/vllm-project/vllm/pull/6517
I have this same problem too.
Traceback (most recent call last):
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
self.model_runner.profile_run()
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
hidden_or_intermediate_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
hidden_states = layer(
^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
attn_output = self.attention(
^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
q, k = self.rotary_emb(position_ids, q, k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
return self._forward_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
ops.rotary_embedding(positions, query, key, self.head_size,
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 43, in wrapper
raise e
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 34, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
torch.ops._C.rotary_embedding(positions, query, key, head_size,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/_ops.py", line 921, in __getattr__
raise AttributeError(
AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
I moved the vllm (inside vllm) subdirectory. Then installed 0.4.2 and that solved my issues. Rms and rotary.
having the same issue with 0.5.2... do we plan to fix it? probably due to the requirements of torch 2.3.1
Assuming that most people are having glibc versioning problems here, this issue should be fixed for most people in 0.5.3 and later, now that we are building on Ubuntu 20.04. I think we can go ahead and close this one.
For me this issue persists on Ubuntu 22.04 with vllm 0.6.0 (also with v0.4.2).
Your current environment
🐛 Describe the bug
I encountered the following error when running Gemma-2-9b. Even after deleting and reinstalling the virtual environment, the same error repeats.