Hi,
In the paper about the distillation, the loss L = Ls + Larc. (Larc is the classification arc loss)
Do you have any experiment results about the comparison between the case with the classification loss Larc and the one without the Larc in distillation?
Hi, In the paper about the distillation, the loss L = Ls + Larc. (Larc is the classification arc loss) Do you have any experiment results about the comparison between the case with the classification loss Larc and the one without the Larc in distillation?