princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
188 stars 32 forks source link

Maybe confusing description of the distillation constraint #26

Closed sbwww closed 1 year ago

sbwww commented 1 year ago

Hi, I just noticed a confusing description of the distillation constraint. Intuitively, I (and probably many other readers) would imagine the distillation from bottom to top, i.e., from layer 1 to layer 12. And to tackle layer mismatching, it is likely that we need higher student layer matched with higher teacher layer. Thus, it is weird to see the constraint as "lower than the previous matched layer".

image

After reading the code trainer.py line 601, I know the distillation is top-down, so the constraint is "lower than the previous matched layer", but I think the distillation direction needs to be clarified.

for search_index in range(3, -1, -1):
xiamengzhou commented 1 year ago

Hi,

Yes, the distillation layer matching process is top-down; we first match the 12-th teacher layer to a student layer and then match the 9-th teacher layer, etc. Therefore, in the constrained version, we only allow matching a teacher layer to a lower student layer than the previously matched ones. Thanks for pointing out; we will find a chance to make it more clear in our updated version.