Open bjoernpl opened 6 months ago
The 160
comes from
Since there is no head_dim
in https://huggingface.co/stabilityai/stablelm-2-12b/blob/main/config.json, it's calculated by hidden_size // num_attention_neads = 5120 // 32 = 160
.
Looking at modeling_stablelm.py
in transformers
this appears to be the correct calculation
@WoosukKwon will this head size remain unsupported by PagedAttention
?
@WoosukKwon will this head size remain unsupported by
PagedAttention
?
I also met this problem. Hope this could be solved, considering the importance of stableLM.
Does this has any progress? https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 also met this problem.
Your current environment
🐛 Describe the bug
Running vLLM with the new StableLM model
stabilityai/stablelm-2-12b
leads to this error regarding head size.