Closed hyx1999 closed 11 months ago
Hi! Using expected_sparsity - target_sparsity
should be more a principled way to regularize the eventual sparsity as it allows the mask exploration process to go in both directions.
Yet, using a small dataset does inevitably cause more instability. Using abs essentially restricts the expected sparsity to be mostly larger than the target sparsity, and that might be why it's more stable.
Thank you for your reply!
Hi, we've recently been experimenting with compression models based on CoFi, and we've found that on small datasets, using the Lagrangian term from the paper causes the model to converge to a size smaller than the target sparsity. However, taking an absolute value for (expected_sparsity - target_sparsity) in the Lagrangian term seems to ameliorate the problem. Do you think (expected_sparsity - target_sparsity).abs() would be a better choice for calculating the Lagrangian term?