gradient checkpointing runs during validations in the training examples

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

https://huggingface.co/docs/diffusers

Apache License 2.0

26.46k stars 5.46k forks source link

Open bghira opened 13 hours ago

bghira commented 13 hours ago

The gradient checkpointing enables self.training and then the log_validations run with this logic as well, unnecessarily.

I found that I have to disable this when running even under torch.no_grad(), looked and saw that the official examples do not do this either.

Add print statements to the checkpointing function.

No response

@linoytsaban @yiyixuxu

bghira commented 13 hours ago

this impacted every model I looked at while implementing layer-skipping for checkpointing.