Closed zhangfan-algo closed 5 days ago
另外fsdp+fp16 和zero3_offload+fp16那个更快一些呢
model_name_or_path: /mnt/cluster//models/Qwen/Qwen1.5-1.8B-Chat
stage: sft do_train: true do_eval: true finetuning_type: full deepspeed: /mnt/cluster/LLaMA-Factory_0614/examples/deepspeed/ds_z3_config.json
dataset: test template: qwen cutoff_len: 19500 overwrite_cache: true preprocessing_num_workers: 60
output_dir: test logging_steps: 10 logging_first_step: true save_total_limit: 5 save_strategy: epoch plot_loss: true
gradient_checkpointing: true per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 5.0 lr_scheduler_type: linear warmup_ratio: 0.03 bf16: true ddp_timeout: 180000000 neftune_noise_alpha: 5
val_size: 0.01 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 50
No response
支持 fsdp,参考 fsdp qlora 样例
Reminder
System Info
另外fsdp+fp16 和zero3_offload+fp16那个更快一些呢
Reproduction
model
model_name_or_path: /mnt/cluster//models/Qwen/Qwen1.5-1.8B-Chat
method
stage: sft do_train: true do_eval: true finetuning_type: full deepspeed: /mnt/cluster/LLaMA-Factory_0614/examples/deepspeed/ds_z3_config.json
dataset
dataset: test template: qwen cutoff_len: 19500 overwrite_cache: true preprocessing_num_workers: 60
output
output_dir: test logging_steps: 10 logging_first_step: true save_total_limit: 5 save_strategy: epoch plot_loss: true
train
gradient_checkpointing: true per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 5.0 lr_scheduler_type: linear warmup_ratio: 0.03 bf16: true ddp_timeout: 180000000 neftune_noise_alpha: 5
eval
val_size: 0.01 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 50
Expected behavior
No response
Others
No response