Closed 18140663659 closed 2 months ago
accelerate launch src/train_bash.py \ --stage pt \ --model_name_or_path $model_name_or_path \ --do_train \ --dataset $dataset \ --streaming \ --max_steps 10000 \ --finetuning_type full \ --output_dir $output_dir \ --overwrite_cache \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 500 \ --save_total_limit 2 \ --learning_rate 5e-6 \ --num_train_epochs 1.0 \ --plot_loss \ --use_fast_tokenizer false \ --preprocessing_num_workers 64 \ --cutoff_len 2048 \ --bf16 \ --warmup_steps 10 \ --max_grad_norm 1.0 2>&1 | tee $output_dir/log.txt
在7卡上运行以上预训练代码,10000steps训练大概要2天左右的时间,请问是否有提效的一些方式
torch==1.14.0a0+410ce96 uvicorn fastapi==0.95.1 sse-starlette tiktoken trl==0.7.4 peft>=0.4.0 accelerate>=0.21.0 jieba rouge-chinese gradio fsspec==2023.9.2 transformers==4.31.0
deepspeed==0.9.3 nltk openpyxl
无
不知道你是训练什么模型
Reminder
Reproduction
accelerate launch src/train_bash.py \ --stage pt \ --model_name_or_path $model_name_or_path \ --do_train \ --dataset $dataset \ --streaming \ --max_steps 10000 \ --finetuning_type full \ --output_dir $output_dir \ --overwrite_cache \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 500 \ --save_total_limit 2 \ --learning_rate 5e-6 \ --num_train_epochs 1.0 \ --plot_loss \ --use_fast_tokenizer false \ --preprocessing_num_workers 64 \ --cutoff_len 2048 \ --bf16 \ --warmup_steps 10 \ --max_grad_norm 1.0 2>&1 | tee $output_dir/log.txt
Expected behavior
在7卡上运行以上预训练代码,10000steps训练大概要2天左右的时间,请问是否有提效的一些方式
System Info
torch==1.14.0a0+410ce96 uvicorn fastapi==0.95.1 sse-starlette tiktoken trl==0.7.4 peft>=0.4.0 accelerate>=0.21.0 jieba rouge-chinese gradio fsspec==2023.9.2 transformers==4.31.0
deepspeed==0.9.1
deepspeed==0.9.3 nltk openpyxl
Others
无