Just for curiosity, I've noticed that in your implementation you are using nn.LayerNorm with the standard denominator constant eps=1e-5, whereas in other implementations (DINO [here] and ViT in timm[here]) this parameter is explicitly set to eps=1e-6.
I know that it is a small detail, but details sometimes are super-important for having better models.
Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?
Hi!
thanks for this little piece of juicy code!
Just for curiosity, I've noticed that in your implementation you are using
nn.LayerNorm
with the standard denominator constanteps=1e-5
, whereas in other implementations (DINO
[here] andViT
intimm
[here]) this parameter is explicitly set toeps=1e-6
.I know that it is a small detail, but details sometimes are super-important for having better models.
Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?
Thanks!