mahmoodlab / UNI

Towards a general-purpose foundation model for computational pathology - Nature Medicine
Other
348 stars 48 forks source link

Training loss(es) during pre-training #25

Closed afilt closed 2 months ago

afilt commented 7 months ago

Hello, I was wondering if you could provide additional details on the evolution of loss functions during the pre-training of UNI. It has indeed been observed that instabilities or convergence issues may hinder the pre-training. Is this something you already observed ?

Congratulations for this groundbreaking work and for publicly releasing weights.

Richarizardd commented 2 months ago

Hi @afilt - when I was training UNI, the overall loss curve was very smooth. As DINOv2 is very close in SSL implementation to iBOT, many of the suggestions for improving iBOT stability can also carry over, .e.g. - https://github.com/bytedance/ibot/issues/19. Hyper-params such as lowering the temperature, increasing the number of iterations for freezing the network (only training the last layer) during initial training, adjusting clip_grad, etc. I would suggest performing the short run for ViT-L/16 first to see if this configuration works for you.