Closed backspacetg closed 2 years ago
Hi,
Sorry for getting back to you late! The pre-pruning fine-tuning step is to distill a finetuned BERT model to a pre-trained BERT model. Let me know if you have more questions!
Oh I see! Thanks for your reply!
Thank you for your amazing work!
I have some difficulty understanding the pre-pruning fine-tune steps in the code. I found that in pre-pruning fine-tune steps, only layer and prediction distillation losses are calculated, but it seems that the teacher and student models are both bert-base models. Does this mean that the distillation is between two same models? If so, why should we do that?