Closed wutaiqiang closed 2 years ago
Hi, thanks for your questions!
We mainly follow previous pruning works' setup, e.g. Sahn et al. 2020 and Lagunas et al. 2021 to show and compare on development sets only given that it would be hard to get test results on all sparsities. But feel free to test our models on GLUE test sets!
We position our paper mainly as a pruning paper and we select distillation baselines that involve a vanilla distillation setting to make the point that structured pruning is able to close the gap to general distillation + task-specific distillation. A comparison to NasBERT and BERT-EMD would be interesting but we think comparing to TinyBERT is sufficient to establish our claim.
Thank u for your kind reply~
Nice work! I have two questions: 1) why report the GLUE dev set results only? 2) Some strong baselines are not compared, such as NasBERT BERT-EMD.