Hi, Siqi
Nice work!
In the part of selecting teacher features for comparison, the code shows that the teacher features are selected from every two layers, which is different from the paper ( concat all layers and project to lower dimension ). And the result I got with contrastive loss shows only little improvement, which is different from the paper.
Could you give some hint about your training and the hyperparam setting?
Hi, Siqi Nice work! In the part of selecting teacher features for comparison, the code shows that the teacher features are selected from every two layers, which is different from the paper ( concat all layers and project to lower dimension ). And the result I got with contrastive loss shows only little improvement, which is different from the paper.
Could you give some hint about your training and the hyperparam setting?