huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.77k stars 941 forks source link

accelerate Inappropriate convert all inputs to specific dtype #2811

Closed rangehow closed 1 month ago

rangehow commented 4 months ago

As title said, I made a minium snippet like this:


from transformers import ,Trainer

class MyTrainer(Trainer):
  def compute_loss(inputs):
        input_ids = inputs.pop("input_ids")
        attention_mask = inputs.pop("attention_mask")

        valid_label_index_list = inputs.pop(
            "valid_label_index_list"
        ) 
        all_prob_supervised = inputs.pop("all_prob_supervised")
        all_prob_clm = inputs.pop("all_prob_clm")
        supervised_cnt = inputs.pop("supervised_cnt")
        clm_cnt = inputs.pop("clm_cnt")

        result = model(
              input_ids=input_ids,
              attention_mask=attention_mask,
          )

If we set amp to 'bfloat16', all element in inputs will be convert from float32 to bfloat16. While amp supported by huggingface trainer argument will not do this. So I am not sure if this is a feature of accelerate or just a bug? If this is a feature, how can we avoid this? (Since other input like clm_cnt in the case won't go into model, so there is no need to convert them to bf16)

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_accumulation_steps: 8
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: false
  zero3_save_16bit_model: false
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

System info (please complete the following information):

muellerzr commented 4 months ago

We can try, but I'm not 100% sure we can, since generally we assume that anything related in there will be cast via amp for loss calculation properly.

rangehow commented 4 months ago

We can try, but I'm not 100% sure we can, since generally we assume that anything related in there will be cast via amp for loss calculation properly.我们可以尝试,但我不是100%确定我们可以,因为通常我们假设那里的任何相关内容都将通过 amp 进行正确的损失计算。

Thanks in advance, I am not sure if I made it clear If I run my training scripts using python train.py instead of accelerate launch train.py , and set bf16 in trainingArguments to true

The training procedure is still mixed precision but it won't convert all inputs to compute_loss to bf16. (Since here exist some input I don't need send it to model, so it does not need to be converted.)

It would be great if this process could be handled correctly : )

muellerzr commented 4 months ago

Yes you were quite clear. I'll see what we can do but again it's a blanket usage of autocast() so not sure if we can (on purpose)

muellerzr commented 4 months ago

Though the answer is likely yes, since the model has its own autocast wrapper around the forward() method

BismarckBamfo commented 3 months ago

Hi, is there are a status on this. I am currently using a messy hack where I convert all the tensors to lists and pass the lists to the model. Then reconstruct the lists into tensors in the forward() call

cuent commented 3 months ago

It could be related. I’m working with diffusion models and PyG, and when I batch edge_indices, they are converted to mixed precision FP16. However, I need to keep edge_indices as int64 for indexing. For now, my workaround is to cast them back to int64 after precomputing the indices.

rangehow commented 3 months ago

It could be related. I’m working with diffusion models and PyG, and when I batch edge_indices, they are converted to mixed precision FP16. However, I need to keep edge_indices as int64 for indexing. For now, my workaround is to cast them back to int64 after precomputing the indices.

This is indeed a temporary solution; however, this conversion will result in precision loss, making it unsuitable for scenarios that are sensitive to precision.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.