🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
[X] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
run accelerate launch run.py --mixed_precision "bf16"
I'm using 2 GPUs here
Expected behavior
I'm currently using accelerate to train my own LLMs!
When generating during evaluation to check the model quality, I've observed a much slower training time after having generated once!
As you can see on the logs (here's an example), the training is much slower after having used generate_step!
When removing the generation part, the training is as fast as expected!
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
accelerate launch run.py --mixed_precision "bf16"
I'm using 2 GPUs here
Expected behavior
I'm currently using
accelerate
to train my own LLMs! When generating during evaluation to check the model quality, I've observed a much slower training time after having generated once!As you can see on the logs (here's an example), the training is much slower after having used![image](https://github.com/huggingface/accelerate/assets/52246514/85bc1a5c-8faa-4816-8718-4833ee9aba98)
generate_step
!When removing the generation part, the training is as fast as expected!
cc @SunMarc and @muellerzr !