princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
192 stars 31 forks source link

About prepruning finetune steps #9

Closed backspacetg closed 2 years ago

backspacetg commented 2 years ago

Thank you for your amazing work!

I have some difficulty understanding the pre-pruning fine-tune steps in the code. I found that in pre-pruning fine-tune steps, only layer and prediction distillation losses are calculated, but it seems that the teacher and student models are both bert-base models. Does this mean that the distillation is between two same models? If so, why should we do that?

xiamengzhou commented 2 years ago

Hi,

Sorry for getting back to you late! The pre-pruning fine-tuning step is to distill a finetuned BERT model to a pre-trained BERT model. Let me know if you have more questions!

backspacetg commented 2 years ago

Oh I see! Thanks for your reply!