princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
189 stars 31 forks source link

How to get the loss of `lagrangian_regularization` #31

Closed CaffreyR closed 1 year ago

CaffreyR commented 2 years ago

Hi! In your code you calculate the Lc

image

https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py#L682

And you use expected_size to calculate expected_sparsity , but does it match the equation in your paper?

image

https://github.com/princeton-nlp/CoFiPruning/blob/main/models/l0_module.py#L267

Actually you said that sˆ is the expected model sparsity calculated from z , but the lagrangian_regularization() do not have inputs or z Many thanks!

zhangzhenyu13 commented 2 years ago

z is achieved via teh hard-sigmoid func, via (1- Q(z| tehta)), i.e. 1- cdf_qz(0), which is a sampled score in forward.

Therefore, hat(s) is not achieved by inputs but only by the parameters such as loga, etc. (updated during training) .

CaffreyR commented 2 years ago

Emm, does it relevent to the equation of the paper? I mean in the paper it did say in these two equation image

image
CaffreyR commented 2 years ago

And what is the difference of cdf_qz and quantile_concrete function?

zhangzhenyu13 commented 2 years ago

check the paper carefully, the L0 norm is proposed here: Learning Sparse Neural Networks through $L_0$ Regularization (arxiv.org) detailed description of q(z) and that Q(z) the CDF of q is derived in the paper, where z is generated exactly as the paper in CoFi and the difference is that CoFi applies structured grouped parameter-masking strategies.

xiamengzhou commented 1 year ago

Thanks @zhangzhenyu13, for the answer!

It's also worth checking out Structured Pruning of Large Language Models, which is the first work that proposes to adapt L0 regularization to control the sparsity of the models.