Closed frei-x closed 1 month ago
Refer to https://github.com/vllm-project/vllm/issues/7689#issuecomment-2299588012, you can use latest version transformers
:
pip install git+https://github.com/huggingface/transformers
参考 #7689 (comment),您可以使用最新版本:
transformers
pip install git+https://github.com/huggingface/transformers
Success with transformers 4.45.0.dev0
Thanks @Isotr0py !
Your current environment
I'm encountering an AssertionError when trying to load the Qwen 2.5 GGUF (Qwen-2.5-q3_gguf.bin) model using vLLM. The error occurs in the vocab_parallel_embedding.py file, where it asserts that the loaded weight's shape matches the expected vocabulary size. Below is the traceback of the error:
Model Input Dumps
No response
🐛 Describe the bug
python -m vllm.entrypoints.openai.api_server --model /data/models/Qwen2.5-32B-Instruct-GGUF-q3_k_m/qwen2.5-32b-instruct-q3_k_m.gguf --dtype float16 --api-key '' --tensor-parallel-size 1 --trust-remote-code --gpu-memory-utilization 0.8 --port 8000 --max_model_len 10000 --enforce-eager --quantization gguf
gguf file It works fine in ollama
Before submitting a new issue...