Training loss(es) during pre-training

mahmoodlab / UNI

Towards a general-purpose foundation model for computational pathology - Nature Medicine

Other

348 stars 48 forks source link

Hi @afilt - when I was training UNI, the overall loss curve was very smooth. As DINOv2 is very close in SSL implementation to iBOT, many of the suggestions for improving iBOT stability can also carry over, .e.g. - https://github.com/bytedance/ibot/issues/19. Hyper-params such as lowering the temperature, increasing the number of iterations for freezing the network (only training the last layer) during initial training, adjusting clip_grad, etc. I would suggest performing the short run for ViT-L/16 first to see if this configuration works for you.

mahmoodlab / UNI

Training loss(es) during pre-training #25