microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.66k stars 225 forks source link

About the teacher logits of the TinyViT #122

Closed BlueCat7 closed 2 years ago

BlueCat7 commented 2 years ago

Dear Author, Thanks for your great work! I generate the logits of CLIP and then put the logits to another meachine to check it, but diff_rate can't be 0. When I use the same machine to generate logits and check it, it's ok, the diif_rate is 0. So I am confused about where may be wrong. Looking forward to your reply, thanks!

wkcn commented 2 years ago

Thanks for your attention to our work!

It is normal since the floating-point number arithmetic produces error, which is related to different version of CUDA and PyTorch, as well as batch size.

The knowledge distillation is robust for the arithmetic error. Although the diff_rate may be not 0, the saved logits can be used for the distillation without loss of accuracy.

BlueCat7 commented 2 years ago

Thanks for your attention to our work!

It is normal since the floating-point number arithmetic produces error, which is related to different version of CUDA and PyTorch, as well as batch size.

The knowledge distillation is robust for the arithmetic error. Although the diff_rate may be not 0, the saved logits can be used for the distillation without loss of accuracy. Ok, thanks for your reply.