compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
deepspeed_multinode_launcher: standard
gradient_accumulation_steps: auto
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_process_ip: 0.0.0.0
main_process_port: 0
main_training_function: main
mixed_precision: fp8
fp8_config:
amax_compute_algorithm: max
amax_history_length: 1024
backend: TE
fp8_format: HYBRID
interval: 1
margin: 0
override_linear_precision: false
use_autocast_during_eval: true
num_machines: 3
num_processes: 24
rdzv_backend: etcd-v2
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Who can help?
No response
Information
[ ] The official example scripts
[ ] My own modified scripts
Tasks
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction
accelerate launch --config_file acc_cfg.yml train.py $TRAINING_ARGS
the train.py is any training script that train using transformers.Trainer
$TRAINING_ARGS are the TrainingArguments plus some path to data
Expected behavior
Deepseed do not capture that the mixed precision is fp8 and it switches to bf16.
System Info
acc_cfg.yml:
compute_environment: LOCAL_MACHINE
debug: false deepspeed_config: deepspeed_multinode_launcher: standard gradient_accumulation_steps: auto gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' machine_rank: 0 main_process_ip: 0.0.0.0 main_process_port: 0 main_training_function: main mixed_precision: fp8 fp8_config: amax_compute_algorithm: max amax_history_length: 1024 backend: TE fp8_format: HYBRID interval: 1 margin: 0 override_linear_precision: false use_autocast_during_eval: true num_machines: 3 num_processes: 24 rdzv_backend: etcd-v2 same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
accelerate launch --config_file acc_cfg.yml train.py $TRAINING_ARGS the train.py is any training script that train using transformers.Trainer $TRAINING_ARGS are the TrainingArguments plus some path to data
Expected behavior
Deepseed do not capture that the mixed precision is fp8 and it switches to bf16.