princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
191 stars 31 forks source link

About comparsion with other baseline #4

Closed wutaiqiang closed 2 years ago

wutaiqiang commented 2 years ago

Nice work! I have two questions: 1) why report the GLUE dev set results only? 2) Some strong baselines are not compared, such as NasBERT BERT-EMD.

xiamengzhou commented 2 years ago

Hi, thanks for your questions!

We mainly follow previous pruning works' setup, e.g. Sahn et al. 2020 and Lagunas et al. 2021 to show and compare on development sets only given that it would be hard to get test results on all sparsities. But feel free to test our models on GLUE test sets!

We position our paper mainly as a pruning paper and we select distillation baselines that involve a vanilla distillation setting to make the point that structured pruning is able to close the gap to general distillation + task-specific distillation. A comparison to NasBERT and BERT-EMD would be interesting but we think comparing to TinyBERT is sufficient to establish our claim.

wutaiqiang commented 2 years ago

Thank u for your kind reply~