princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
188 stars 32 forks source link

The usage of L_c #18

Closed Ther-nullptr closed 2 years ago

Ther-nullptr commented 2 years ago

image I do not understand how this loss works ---- since $\lambda_1$ and $\lambda_2$ are 0 as default, I find that sometimes the loss maybe a negative number sometimes.

Ther-nullptr commented 2 years ago

Does the deletion of this loss have a significant impact on the results of the experiment?

xiamengzhou commented 2 years ago

This part guarantees that a target sparsity is met after optimization by enforcing a lagrangian constraint. Removing this part would break the pruning process. Please refer to Wang et al. 2021 for more details

Ther-nullptr commented 2 years ago

get it!

Ther-nullptr commented 2 years ago

Btw, why use negative lr(train.py)?

lagrangian_params = [{
                    "params": [p for n, p in self.l0_module.named_parameters() if "lambda" in n],
                    "weight_decay": 0.0,
                    "lr": -self.additional_args.reg_learning_rate
                }]
xiamengzhou commented 2 years ago

Solving the constrained optimization problem with Lagrange multipliers is a min-max optimization problem. You minimize the loss (including the task loss and the sparsity constraint) by optimizing over main mode parameters and l0 module's parameters and maximizing the loss over the Lagrange multipliers to enforce the constraint to be met. Since we need to maximize the loss for the Lagrange multipliers, we set its LR as a negative value.

xiamengzhou commented 2 years ago

Hi, I am closing this issue now. Feel free to reopen it if you have more questions :)