launch deepspeed in mixed precision fp8 using HF Trainer is not working

System Info

Capture d'écran 2024-10-08 114022 acc_cfg.yml:

compute_environment: LOCAL_MACHINE
debug: false deepspeed_config: deepspeed_multinode_launcher: standard gradient_accumulation_steps: auto gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' machine_rank: 0 main_process_ip: 0.0.0.0 main_process_port: 0 main_training_function: main mixed_precision: fp8 fp8_config: amax_compute_algorithm: max amax_history_length: 1024 backend: TE fp8_format: HYBRID interval: 1 margin: 0 override_linear_precision: false use_autocast_during_eval: true num_machines: 3 num_processes: 24 rdzv_backend: etcd-v2 same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

accelerate launch --config_file acc_cfg.yml train.py $TRAINING_ARGS the train.py is any training script that train using transformers.Trainer $TRAINING_ARGS are the TrainingArguments plus some path to data

Expected behavior

Deepseed do not capture that the mixed precision is fp8 and it switches to bf16.

huggingface / transformers