microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.61k stars 220 forks source link

The results of the experiment could not be reproduced #233

Open bo102 opened 3 months ago

bo102 commented 3 months ago

Thank you for your excellent work, I can't achieve the effect in your paper in the process of reproducing the compression of the swin-transformer model, in detail, I use the swin model you defined to train the teacher model on my own dataset, but the accuracy has not been up, in addition, I also use my own teacher model to distill directly, the accuracy can not go up, what is going on? Thank you very much!

bo102 commented 3 months ago

Do you use your own teacher model for distillation? or do you have to use the swin-transformer base model generated by your code to train on the dataset and then distill it as a teacher model?Thank you very much!