microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
5.83k stars 990 forks source link

DeepSpeed-Chat step-1 hanging for a long time #906

Open lemon-little opened 2 weeks ago

lemon-little commented 2 weeks ago

图片 图片

deepspeed --hostfile ~/hostfile \ --num_gpus 4 \ --num_nodes 2 \ --master_addr 172.16.4.41 \ main.py \ --data_path Dahoas/rm-static \ --data_split 2,4,4 \ --model_name_or_path shakechen/Llama-2-7b-hf/ \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --max_seq_len 512 \ --learning_rate 9.65e-6 \ --weight_decay 0. \ --num_train_epochs 1 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --gradient_checkpointing \ --zero_stage 3 \ --deepspeed \ --output_dir /home/bingxing2/home/scx7avs/Deepspeed/output/