Closed Ther-nullptr closed 2 years ago
Does the deletion of this loss have a significant impact on the results of the experiment?
This part guarantees that a target sparsity is met after optimization by enforcing a lagrangian constraint. Removing this part would break the pruning process. Please refer to Wang et al. 2021 for more details
get it!
Btw, why use negative lr(train.py
)?
lagrangian_params = [{
"params": [p for n, p in self.l0_module.named_parameters() if "lambda" in n],
"weight_decay": 0.0,
"lr": -self.additional_args.reg_learning_rate
}]
Solving the constrained optimization problem with Lagrange multipliers is a min-max optimization problem. You minimize the loss (including the task loss and the sparsity constraint) by optimizing over main mode parameters and l0 module's parameters and maximizing the loss over the Lagrange multipliers to enforce the constraint to be met. Since we need to maximize the loss for the Lagrange multipliers, we set its LR as a negative value.
Hi, I am closing this issue now. Feel free to reopen it if you have more questions :)
I do not understand how this loss works ---- since $\lambda_1$ and $\lambda_2$ are 0 as default, I find that sometimes the loss maybe a negative number sometimes.