The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
DKD看起来是对全连接层输出的logits进行蒸馏的,请问是否可以应用在没有全连接层的网络中,但看起来需要通道对齐操作是吗