Closed Ther-nullptr closed 2 years ago
These three optimizers are used to update the parameters of different model parts. self.optimizer
is used to update the main model parameters; self.l0_optimizer
is used to update the parameters of the L0 module and self.lagrangian_optimizer
is used to update the lagrangian parameters ($\lambda_1$ and $\lambda_2$). The learning rates used for these optimizers are different.
get it!
Hi! I want to ask why we should use 3 optimizers during training? I think
self.optimizer.zero_grad()
is enough.