I have a question regarding the adaptive gradient clipping. My understanding is in Pytorch lightning, the loss.backward() happens after the trainer receives the loss from pl_module.training_step(). But in gan.py, adaptive gradient clipping takes place before loss is returned, which means the gradients are still in their last state. Is this an approximation by design? Or maybe I could be wrong.
Hi, thank you for the implementation.
I have a question regarding the adaptive gradient clipping. My understanding is in Pytorch lightning, the loss.backward() happens after the trainer receives the loss from pl_module.training_step(). But in gan.py, adaptive gradient clipping takes place before loss is returned, which means the gradients are still in their last state. Is this an approximation by design? Or maybe I could be wrong.
Thank you for your answer in advance.