Closed zhangzhenyu13 closed 2 years ago
I find that a fixed teacher layer sets might not be a good choice for cofi; so it would make the method more robust to introduce the random teacher sets selection. refer this: [2109.10164] RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation (arxiv.org)
I find that a fixed teacher layer sets might not be a good choice for cofi; so it would make the method more robust to introduce the random teacher sets selection. refer this: [2109.10164] RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation (arxiv.org)