Introducee random teacher layer sets

princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

MIT License

188 stars 32 forks source link

Closed zhangzhenyu13 closed 1 year ago

zhangzhenyu13 commented 1 year ago

I find that a fixed teacher layer sets might not be a good choice for cofi; so it would make the method more robust to introduce the random teacher sets selection. refer this: [2109.10164] RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation (arxiv.org)