Closed RyanCcc114 closed 2 days ago
pytorch:2.1.0-cuda11.8
bf16: true cutoff_len: 1024 dataset: EE_instruction_message dataset_dir: data ddp_timeout: 180000000 do_train: true eval_steps: 100 eval_strategy: steps finetuning_type: lora flash_attn: fa2 gradient_accumulation_steps: 8 include_num_input_tokens_seen: true learning_rate: 0.0003 logging_steps: 5 lora_alpha: 32 lora_dropout: 0.1 lora_rank: 8 lora_target: all lr_scheduler_type: cosine max_grad_norm: 1.0 max_samples: 100000 model_name_or_path: src/llamafactory/model/model/glm4-chat num_train_epochs: 2.0 optim: adamw_torch output_dir: saves/GLM-4-9B-Chat/lora/train_2024-06-24-23-30-00 packing: false per_device_eval_batch_size: 1 per_device_train_batch_size: 1 plot_loss: true preprocessing_num_workers: 16 report_to: none save_steps: 100 stage: sft template: glm4 val_size: 0.2 warmup_steps: 0.01
加载微调后的glm4模型不生成回答,无论输入什么内容,模型返回的内容都是空白。 训练日志中的loss是逐步下降,并且评估阶段的loss同样下降。但是执行predict脚本生成的结果都为0。
No response
Reminder
System Info
pytorch:2.1.0-cuda11.8
Reproduction
bf16: true cutoff_len: 1024 dataset: EE_instruction_message dataset_dir: data ddp_timeout: 180000000 do_train: true eval_steps: 100 eval_strategy: steps finetuning_type: lora flash_attn: fa2 gradient_accumulation_steps: 8 include_num_input_tokens_seen: true learning_rate: 0.0003 logging_steps: 5 lora_alpha: 32 lora_dropout: 0.1 lora_rank: 8 lora_target: all lr_scheduler_type: cosine max_grad_norm: 1.0 max_samples: 100000 model_name_or_path: src/llamafactory/model/model/glm4-chat num_train_epochs: 2.0 optim: adamw_torch output_dir: saves/GLM-4-9B-Chat/lora/train_2024-06-24-23-30-00 packing: false per_device_eval_batch_size: 1 per_device_train_batch_size: 1 plot_loss: true preprocessing_num_workers: 16 report_to: none save_steps: 100 stage: sft template: glm4 val_size: 0.2 warmup_steps: 0.01
Expected behavior
加载微调后的glm4模型不生成回答,无论输入什么内容,模型返回的内容都是空白。 训练日志中的loss是逐步下降,并且评估阶段的loss同样下降。但是执行predict脚本生成的结果都为0。
Others
No response