shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
2.94k stars 452 forks source link

yi-6B sft loss为0 #296

Closed nuoma closed 6 months ago

nuoma commented 6 months ago

以yi-6B为基座,进行全参数sft的时候loss为0. transformer==4.34.0

··· CUDA_VISIBLE_DEVICES=0,1,2,3,5,6,7 torchrun --nproc_per_node 7 ../supervised_finetuning.py \ --model_type auto \ --model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \ --tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \ --train_file_dir ../data/finetune/1229_ReproduceBlossom/stage1/ \ --per_device_train_batch_size 4 \ --do_train \ --max_train_samples -1 \ --num_train_epochs 1 \ --learning_rate 2e-6 \ --weight_decay 0. \ --bp16 \ --use_peft False \ --logging_strategy steps \ --logging_steps 10 \ --save_strategy epoch \ --save_total_limit 5 \ --gradient_accumulation_steps 1 \ --preprocessing_num_workers 8 \ --output_dir ../outputs/20240102_yi6B_SFTv3_stg1 \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --torch_dtype bfloat16 \ --device_map auto \ --report_to tensorboard \ --ddp_find_unused_parameters False \ --gradient_checkpointing True \ --cache_dir ./cache \ --model_max_length 4096 \ --deepspeed ../deepspeed_zero_stage2_config_no16.json \ --template_name yi
···

shibing624 commented 6 months ago

--bp16 是bf16

nuoma commented 6 months ago

抱歉是笔误,还是一样的问题。 去掉deepspeed后是正常的。 image

看到了这个issue(https://github.com/microsoft/DeepSpeed/issues/4661),但我的deepspeed版本是0.12.5 ds_config: ··· { "optimizer": { "type": "AdamW", "params": { "lr": "auto", "weight_decay": "auto", "torch_adam": true, "adam_w_mode": true } }, "scheduler": { "type": "WarmupDecayLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto", "total_num_steps": "auto" } }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 2e8, "reduce_scatter": true, "reduce_bucket_size": "auto", "overlap_comm": true, "contiguous_gradients": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 1000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false } ···

nuoma commented 6 months ago

加上ds config,降ds降为0.11.1,该问题解决