Closed cdj0311 closed 10 months ago
hi, I finetune CodeLlama-34b-python-hf with train_wizardcoder.py, but I get loss=0 after trained over a hundred steps, however, the 7b and 13b did not have this promble.
PyTorch == 2.0.1 Transformers == 4.31.0 Deepspeed == 0.9.3
BASE_MODEL=./CodeLlama-34b-Python-hf OUTPUT_MODEL=./CodeLlama-34b-Python-Evol torchrun --nproc_per_node 8 \ --nnodes 4 \ --node_rank 0 \ --master_addr "localhost" \ --master_port 6000 \ train_wizard.py \ --model_name_or_path $BASE_MODEL \ --data_path "/path/code-evol-instruct.json" \ --output_dir $OUTPUT_MODEL \ --num_train_epochs 2 \ --model_max_length 2048 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --warmup_steps 0 \ --logging_steps 1 \ --lr_scheduler_type "cosine" \ --report_to "tensorboard" \ --gradient_checkpointing True \ --deepspeed configs/deepspeed_config.json \ --bf16 True
{ "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 0, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 0, "stage3_max_reuse_distance": 0, "stage3_gather_16bit_weights_on_model_save": true }, "bf16": { "enabled": true }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": [ 0.9, 0.999 ], "eps": 1e-8, "weight_decay": 0 } }, "scheduler": { "type": "WarmupDecayLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto", "total_num_steps": "auto" } }, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "wall_clock_breakdown": false }
have solved.
I got the same error, how do you solve this problem, bro.
hi, I finetune CodeLlama-34b-python-hf with train_wizardcoder.py, but I get loss=0 after trained over a hundred steps, however, the 7b and 13b did not have this promble.
Environment
PyTorch == 2.0.1 Transformers == 4.31.0 Deepspeed == 0.9.3
Script
Deespeed config: