Closed PhysicianHOYA closed 4 days ago
[2024-06-24 21:08:01,145] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
llamafactory
FORCE_TORCHRUN=1 llamafactory-cli train my_examples/train.yaml
model_name_or_path: Qwen2-57B
quantization_bit: 4 double_quantization: true quantization_type: nf4
stage: sft do_train: true finetuning_type: lora lora_rank: 8 lora_alpha: 16 lora_target: all
ddp_timeout: 180000000 deepspeed: examples/deepspeed/ds_z2_config-copy.json
dataset: alpaca_zh_demo template: qwen cutoff_len: 1024 max_samples: 100000 data_seed: 42 overwrite_cache: true preprocessing_num_workers: 16
output_dir: saves/Qwen2-57B-lora logging_steps: 10 save_steps: 100 plot_loss: true overwrite_output_dir: true save_total_limit: 2 gradient_checkpointing: true
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 4.0e-5 num_train_epochs: 5 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true
微调模型时单卡占用40G显存。希望能改进为双卡每张卡各占20G显存左右,而不是目前的双卡都各占40G显存。
No response
Reminder
System Info
[2024-06-24 21:08:01,145] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
llamafactory
version: 0.8.3.dev0Reproduction
FORCE_TORCHRUN=1 llamafactory-cli train my_examples/train.yaml
model
model_name_or_path: Qwen2-57B
adapter_name_or_path:
quantization_bit: 4 double_quantization: true quantization_type: nf4
method
stage: sft do_train: true finetuning_type: lora lora_rank: 8 lora_alpha: 16 lora_target: all
ddp
ddp_timeout: 180000000 deepspeed: examples/deepspeed/ds_z2_config-copy.json
dataset
dataset: alpaca_zh_demo template: qwen cutoff_len: 1024 max_samples: 100000 data_seed: 42 overwrite_cache: true preprocessing_num_workers: 16
output
output_dir: saves/Qwen2-57B-lora logging_steps: 10 save_steps: 100 plot_loss: true overwrite_output_dir: true save_total_limit: 2 gradient_checkpointing: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 4.0e-5 num_train_epochs: 5 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true
Expected behavior
微调模型时单卡占用40G显存。希望能改进为双卡每张卡各占20G显存左右,而不是目前的双卡都各占40G显存。
Others
No response