hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.26k stars 3.13k forks source link

vllm部署freeze微调后的qwen2-57b-instruct报错 #4545

Closed hexixiang closed 3 days ago

hexixiang commented 3 days ago

Reminder

System Info

Reproduction

2024-06-26 10:34:05,491 INFO worker.py:1770 -- Started a local Ray instance. INFO 06-26 10:34:06 config.py:623] Defaulting to use mp for distributed inference Traceback (most recent call last): File "/root/anaconda3/envs/hxx2/bin/llamafactory-cli", line 8, in sys.exit(main()) File "/home/hxx/LLaMA-Factory-main/src/llamafactory/cli.py", line 79, in main run_api() File "/home/hxx/LLaMA-Factory-main/src/llamafactory/api/app.py", line 117, in run_api chat_model = ChatModel() File "/home/hxx/LLaMA-Factory-main/src/llamafactory/chat/chat_model.py", line 45, in init self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args) File "/home/hxx/LLaMA-Factory-main/src/llamafactory/chat/vllm_engine.py", line 94, in init self.model = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**engine_args)) File "/root/anaconda3/envs/hxx2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 371, in from_engine_args engine_config = engine_args.create_engine_config() File "/root/anaconda3/envs/hxx2/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 766, in create_engine_config return EngineConfig(model_config=model_config, File "", line 13, in init File "/root/anaconda3/envs/hxx2/lib/python3.10/site-packages/vllm/config.py", line 1378, in __post_init__ self.model_config.verify_with_parallel_config(self.parallel_config) File "/root/anaconda3/envs/hxx2/lib/python3.10/site-packages/vllm/config.py", line 235, in verify_with_parallel_config raise ValueError( ValueError: Total number of attention heads (28) must be divisible by tensor parallel size (8).

Expected behavior

No response

Others

No response

hiyouga commented 3 days ago

vllm 不支持 qwen moe