Closed RuiningLi closed 4 weeks ago
Hi @RuiningLi, thanks for reporting ! Could you share a minimal reproducer ? That would be very helpful ! cc @muellerzr
It is called as part of the DataLoaderStateMixin
here: https://github.com/huggingface/accelerate/blob/main/src/accelerate/data_loader.py#L384
Which then gets called during the __iter__
of the prepared DataLoaders
: https://github.com/huggingface/accelerate/blob/main/src/accelerate/data_loader.py#L448
Also we rely on end_of_dataloader
, which gets added during the last batch. So a full reproducer would indeed be necessary, because that should not be the behavior https://github.com/huggingface/accelerate/blob/main/src/accelerate/data_loader.py#470
Thanks for your replies @SunMarc @muellerzr ! I will try to get a minimal reproducer. But it's likely it's just me being stupid -- I will close the issue for now!
No worries, don't hesitate to re-open/ping us if you find the cause!
During training, I discovered that the gradient doesn't get synced at the end of the dataloader.
After some further investigation, I found this relies on the following code:
which in turn relies on
gradient_state
to haveactive_dataloader
set to a non-None
value. This could be done using the_add_dataloader
function defined inGradientState
class.However, in
Accelerator
, this function is nevered called anywhere. So this logic could not be executed.Is this the desired behavior? Or am I missing something?