Closed pyf98 closed 2 years ago
Hi,
Thanks for following our work! To answer your questions:
1) Yes, you are right! It should be if "_z" in key
. The current logic creates an empty zs. But this issue only slightly affects layer distillation version 4 where we control the order of the layer. I just fixed it :)
2) Yes, your understanding is correct!
3) We largely follow FLOP for the initialization. As you said, if the mean is 0, it is likely that the number of samples from the distribution will be 0. In practice, it does not affect optimization much because the l0 loss is able train loga to meet the sparsity requirement very quickly. We did observe that it setting the mean to 0 for larger units (head, mlp) would cause the start of pruning to be unstable, thus we set it to be a large number to smooth the optimization.
Hope this helps and feel free to ask more questions!
Hi,
I am closing this issue and feel free to open it again if you have more questions :)
Hi, thanks for the great work! I have some questions about the current code.
First, is this following line expected? https://github.com/princeton-nlp/CoFiPruning/blob/main/trainer/trainer.py#L667
Should it be
zs = {key: inputs[key] for key in inputs if "_z" in key}
in order to extractzs
frominputs
?Second, what is the last term
self.hidden_size * 4
in the following line when calculating the params of an FFN layer? https://github.com/princeton-nlp/CoFiPruning/blob/main/models/l0_module.py#L44I guess it means the
bias
parameter of the intermediate dense layer, so it is equivalent toself.intermediate_size
?Third, when initializing the loga params in
l0_module
, thestructured_mlp
uses a differentmean
compared with other components, as shown in the following line: https://github.com/princeton-nlp/CoFiPruning/blob/main/models/l0_module.py#L147It seems the intermediate dimension has an initial sparsity of 0.5, even before any pruning. What is the intuition of setting it this way?
Thank you very much for your time!