princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
192 stars 31 forks source link

(expected_sparsity - target_sparsity) or (expected_sparsity - target_sparsity).abs() #55

Closed hyx1999 closed 11 months ago

hyx1999 commented 11 months ago

Hi, we've recently been experimenting with compression models based on CoFi, and we've found that on small datasets, using the Lagrangian term from the paper causes the model to converge to a size smaller than the target sparsity. However, taking an absolute value for (expected_sparsity - target_sparsity) in the Lagrangian term seems to ameliorate the problem. Do you think (expected_sparsity - target_sparsity).abs() would be a better choice for calculating the Lagrangian term?

xiamengzhou commented 11 months ago

Hi! Using expected_sparsity - target_sparsity should be more a principled way to regularize the eventual sparsity as it allows the mask exploration process to go in both directions.

Yet, using a small dataset does inevitably cause more instability. Using abs essentially restricts the expected sparsity to be mostly larger than the target sparsity, and that might be why it's more stable.

hyx1999 commented 11 months ago

Thank you for your reply!