vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.76k stars 3.41k forks source link

[Bug]: AssertionError when load miqu70b after full sft #3813

Open uRENu opened 3 months ago

uRENu commented 3 months ago

Your current environment

environment pip install auto_gptq modelscope xformers torchvision torchaudio torch==2.1.2 -U pip install datasets huggingface-hub transformers==4.39.1 -U pip install fastrlock cupy-cuda11x==12.1.0 pip install flash_attn==2.5.6 pip install vllm==0.4.0

🐛 Describe the bug

When I load miqu-70b model after full sft(node: 4, nproc_per_node: 8), the following error occurs:

llm = LLM(model=args.ckpt_dir, trust_remote_code=True, seed=42, tensor_parallel_size=torch.cuda.device_count()) File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 112, in init self.llm_engine = LLMEngine.from_engine_args( File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args engine = cls( File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 110, in init self.model_executor = executor_class(model_config, cache_config, File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 62, in init self._init_workers_ray(placement_group) File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 192, in _init_workers_ray self._run_workers( File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 324, in _run_workers driver_worker_output = getattr(self.driver_worker, File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 107, in load_model self.model_runner.load_model() File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 95, in load_model self.model = get_model( File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 101, in get_model model.load_weights(model_config.model, model_config.download_dir, File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 404, in load_weights weight_loader(param, loaded_weight) File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 86, in weight_loader assert loaded_weight.shape[parallel_dim] == self.org_vocab_size AssertionError

(RayWorkerVllm pid=1135) INFO 04-03 14:52:46 pynccl_utils.py:45] vLLM is using nccl==2.18.1 (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution. (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] Traceback (most recent call last): (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] return executor(*args, **kwargs) (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 107, in load_model (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] self.model_runner.load_model() (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 95, in load_model (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] self.model = get_model( (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 101, in get_model (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] model.load_weights(model_config.model, model_config.download_dir, (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 404, in load_weights (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] weight_loader(param, loaded_weight) (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 86, in weight_loader (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] assert loaded_weight.shape[parallel_dim] == self.org_vocab_size (RayWorkerVllm pid=1135) ERROR 04-03 14:52:49 ray_utils.py:44] AssertionError

uRENu commented 3 months ago

but loading the model after lora sft(node: 1, nproc_per_node: 8) is no problem.

uRENu commented 3 months ago

@zhuohan123 @esmeetu

uRENu commented 3 months ago

The problem seems to be caused by the model file not being saved completely after zero3 fine-tuning.

yiyepiaoling0715 commented 1 day ago

The problem seems to be caused by the model file not being saved completely after zero3 fine-tuning.

how do you solve it at last? i met the same problem