vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.16k stars 4.73k forks source link

[Bug]: StableLM 12b head size incorrect #3952

Open bjoernpl opened 7 months ago

bjoernpl commented 7 months ago

Your current environment

Can't run since running on dockerized cluster. Using latest pip install for both vLLM and transformers + CUDA 12.1

🐛 Describe the bug

Running vLLM with the new StableLM model stabilityai/stablelm-2-12b leads to this error regarding head size.

2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 45, in execute_method
2024-04-09T23:20:46Z [job]     raise e
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method
2024-04-09T23:20:46Z [job]     return executor(*args, **kwargs)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/worker/worker.py", line 107, in load_model
2024-04-09T23:20:46Z [job]     self.model_runner.load_model()
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-04-09T23:20:46Z [job]     self.model = get_model(
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 91, in get_model
2024-04-09T23:20:46Z [job]     model = model_class(model_config.hf_config, linear_method)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 236, in __init__
2024-04-09T23:20:46Z [job]     self.model = StableLMEpochModel(config, linear_method)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 198, in __init__
2024-04-09T23:20:46Z [job]     self.layers = nn.ModuleList([
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 199, in <listcomp>
2024-04-09T23:20:46Z [job]     StablelmDecoderLayer(config, linear_method)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 153, in __init__
2024-04-09T23:20:46Z [job]     self.self_attn = StablelmAttention(config)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 125, in __init__
2024-04-09T23:20:46Z [job]     self.attn = Attention(self.num_heads,
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/attention/layer.py", line 35, in __init__
2024-04-09T23:20:46Z [job]     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/attention/backends/flash_attn.py", line 148, in __init__
2024-04-09T23:20:46Z [job]     raise ValueError(
2024-04-09T23:20:46Z [job] ValueError: Head size 160 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256].
hmellor commented 7 months ago

The 160 comes from

https://github.com/vllm-project/vllm/blob/6d592eb430a37a7f8f5f9beb2dbc014bf3aa76bc/vllm/config.py#L267-L272

Since there is no head_dim in https://huggingface.co/stabilityai/stablelm-2-12b/blob/main/config.json, it's calculated by hidden_size // num_attention_neads = 5120 // 32 = 160.

hmellor commented 7 months ago

Looking at modeling_stablelm.py in transformers this appears to be the correct calculation

https://github.com/huggingface/transformers/blob/56d001b26f244018cbbb8aa573fc668b877223fa/src/transformers/models/stablelm/modeling_stablelm.py#L248-L250

hmellor commented 7 months ago

@WoosukKwon will this head size remain unsupported by PagedAttention?

Forence1999 commented 6 months ago

@WoosukKwon will this head size remain unsupported by PagedAttention?

I also met this problem. Hope this could be solved, considering the importance of stableLM.

dblate commented 3 months ago

Does this has any progress? https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 also met this problem.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!