vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.16k stars 4.16k forks source link

[Bug]: StableLM 12b head size incorrect #3952

Open bjoernpl opened 6 months ago

bjoernpl commented 6 months ago

Your current environment

Can't run since running on dockerized cluster. Using latest pip install for both vLLM and transformers + CUDA 12.1

🐛 Describe the bug

Running vLLM with the new StableLM model stabilityai/stablelm-2-12b leads to this error regarding head size.

2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 45, in execute_method
2024-04-09T23:20:46Z [job]     raise e
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method
2024-04-09T23:20:46Z [job]     return executor(*args, **kwargs)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/worker/worker.py", line 107, in load_model
2024-04-09T23:20:46Z [job]     self.model_runner.load_model()
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-04-09T23:20:46Z [job]     self.model = get_model(
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 91, in get_model
2024-04-09T23:20:46Z [job]     model = model_class(model_config.hf_config, linear_method)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 236, in __init__
2024-04-09T23:20:46Z [job]     self.model = StableLMEpochModel(config, linear_method)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 198, in __init__
2024-04-09T23:20:46Z [job]     self.layers = nn.ModuleList([
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 199, in <listcomp>
2024-04-09T23:20:46Z [job]     StablelmDecoderLayer(config, linear_method)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 153, in __init__
2024-04-09T23:20:46Z [job]     self.self_attn = StablelmAttention(config)
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/model_executor/models/stablelm.py", line 125, in __init__
2024-04-09T23:20:46Z [job]     self.attn = Attention(self.num_heads,
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/attention/layer.py", line 35, in __init__
2024-04-09T23:20:46Z [job]     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
2024-04-09T23:20:46Z [job]   File "/workspace/miniconda3/envs/py3.10/lib/python3.10/site-packages/vllm/attention/backends/flash_attn.py", line 148, in __init__
2024-04-09T23:20:46Z [job]     raise ValueError(
2024-04-09T23:20:46Z [job] ValueError: Head size 160 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256].
hmellor commented 6 months ago

The 160 comes from

https://github.com/vllm-project/vllm/blob/6d592eb430a37a7f8f5f9beb2dbc014bf3aa76bc/vllm/config.py#L267-L272

Since there is no head_dim in https://huggingface.co/stabilityai/stablelm-2-12b/blob/main/config.json, it's calculated by hidden_size // num_attention_neads = 5120 // 32 = 160.

hmellor commented 6 months ago

Looking at modeling_stablelm.py in transformers this appears to be the correct calculation

https://github.com/huggingface/transformers/blob/56d001b26f244018cbbb8aa573fc668b877223fa/src/transformers/models/stablelm/modeling_stablelm.py#L248-L250

hmellor commented 6 months ago

@WoosukKwon will this head size remain unsupported by PagedAttention?

Forence1999 commented 4 months ago

@WoosukKwon will this head size remain unsupported by PagedAttention?

I also met this problem. Hope this could be solved, considering the importance of stableLM.

dblate commented 1 month ago

Does this has any progress? https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 also met this problem.