microsoft / SimMIM

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
https://arxiv.org/abs/2111.09886
MIT License
917 stars 86 forks source link

Performance using the cosine distance #28

Open LiyaoTang opened 2 years ago

LiyaoTang commented 2 years ago

Hi @caoyue10 , thanks for your insightful work.

I found that the experiments and discussion in your paper state that different types of distance (e.g. l2, l1) in calculating the loss perform equally well. However, I would like to further know that if this still holds for the Cosine distance as well?

Since cosine distance has been prevalent in previous CL works, and it involves a l2-normalization, I think experimenting with this could be helpful. Could you shed some light on this?

Best.