vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
25.51k stars 3.7k forks source link

Load internlm2-chat-20b get ValueError: Query/Key/Value should all have BMHK or BMK shape. #2755

Open dafen12 opened 6 months ago

dafen12 commented 6 months ago

vllm==0.3.0

from vllm import LLM, SamplingParams llm = LLM( model="internlm/internlm2-chat-20b",
gpu_memory_utilization=.85, max_model_len=2000 , trust_remote_code=True )

get ValueError: Query/Key/Value should all have BMHK or BMK shape. query.shape: torch.Size([1, 2048, 8, 6, 128]) key.shape : torch.Size([1, 2048, 8, 6, 128]) value.shape: torch.Size([1, 2048, 8, 6, 128])

dafen12 commented 6 months ago

@Leymore Can you help with this issue? Thank you

TyRantLQlyf commented 4 months ago

@dafen12 Hi. Have you solved this problem?

TyRantLQlyf commented 4 months ago

@dafen12 It should be because the xformers library version is too low

shiqingzhangCSU commented 4 months ago

I have same issue when use chatglm2.