hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.53k stars 4.25k forks source link

RuntimeError: weight lm_head.weight does not exist,我用Qwen2进行dpo微调后,再调用模型报错 #6073

Open dahaogewsh opened 5 hours ago

dahaogewsh commented 5 hours ago

报错信息: Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app()) File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 92, in serve server.serve( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 246, in serve asyncio.run( File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 205, in serve_inner model = get_model( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 622, in get_model return FlashQwen2( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_qwen2.py", line 72, in init model = Qwen2ForCausalLM(config, weights) File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_qwen2_modeling.py", line 351, in init self.lm_head = SpeculativeHead.load( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 615, in load lm_head = TensorParallelHead.load(config, prefix, weights) File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 654, in load weight = weights.get_tensor(f"{prefix}.weight") File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 99, in get_tensor filename, tensor_name = self.get_filename(tensor_name) File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 63, in get_filename raise RuntimeError(f"weight {tensor_name} does not exist") RuntimeError: weight lm_head.weight does not exist

微调参数:

model

model_name_or_path: qwen2-1.5b

method

stage: dpo do_train: true finetuning_type: full pref_beta: 0.1 pref_loss: simpo # choices: [sigmoid (dpo), orpo, simpo] pref_ftx: 0.5

simpo_gamma: 0.6

dpo_label_smoothing: 0.1

dataset

dataset: zk_dpo template: empty cutoff_len: 1024 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/zk/dpo logging_steps: 10 save_steps: 5000 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 8 gradient_accumulation_steps: 2 learning_rate: 5.0e-6 num_train_epochs: 1.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 report_to: wandb

dahaogewsh commented 5 hours ago

这是怎么回事啊,我查了一圈,看到只有三个模型会省掉lm_head.weight,没有说qwen2保存模型的时候会省这个参数啊,哪位大佬帮帮我?