hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.52k stars 3.16k forks source link

模型没有加载完,gpu利用率已经是100%了 #4591

Closed ceyun1 closed 6 days ago

ceyun1 commented 6 days ago

Reminder

System Info

11

Reproduction

如题,运行命令如下,模型还在加载过程中,显存利用率已经100%了,这是bug么? export WANDB_DISABLED=true deepspeed --master_port=9903 --num_gpus 8 src/train.py \ --deepspeed ./examples/deepspeed/ds_z3_config.json \ --stage sft \ --do_train \ --model_name_or_path --dataset --template qwen \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 200 \ --learning_rate 5e-5 \ --num_train_epochs 10 \ --plot_loss \ --bf16 \ --save_only_model \ --overwrite_output_dir

Loading checkpoint shards: 49%|███████████████▌ | 18/37 [1:05:31<2:18:39, 437.87s/it]

Expected behavior

No response

Others

No response

hiyouga commented 6 days ago

不是