kabouzeid / point2vec

Self-Supervised Representation Learning on Point Clouds (GCPR 2023 | T4V Workshop @ CVPR 2023)
https://point2vec.ka.codes
MIT License
76 stars 6 forks source link

About pre-training loss #5

Closed filaPro closed 4 months ago

filaPro commented 4 months ago

Hi @kabouzeid ,

Thanks for open sourcing your code. Have just 2 questions about pre-training, may be you can help) 1) What is the intuition behind the behavior that training loss first decreases and then increases? 2) If I don't have annotations for my data, is there any intuition when should I stop pre-training? E.g. is validation loss meaningful or it also scan increase at some point?

kabouzeid commented 4 months ago

Hi :)

In general, the pre-training loss is rather uninformative of final downstream task performance, and not really comparable between different learning rates and EMA decay rates. The loss goes up because we have shifting targets: as the model evolves, the targets become more meaningful / information-dense and thus harder to predict. The shape of the loss curve and the whole learning dynamic depends heavily on the EMA decay rate and its warm-up period (it's crucial to find the right balance between EMA and LR here. the loss could also go up much more rapidly, or even slightly go down; none of which is necessarily a bad sign.). For comparison, you can also refer to the data2vec losses: https://github.com/facebookresearch/fairseq/issues/4177#issuecomment-1041020937

In addition to tracking the loss, we also log standard deviations for targets and predictions, as well as linear SVM evaluation. While these metrics were helpful in detecting diverging or broken runs, they didn't strongly correlate with the performance of the fine-tuned model on a downstream task. You don't need annotations for validation during pre-training. However, in our case the validation loss was almost identical to the training loss, so I don't think validation is very important here.

To proceed, I recommend trying a few learning rates (--model.learning_rate) and number of epochs (--trainer.max_epochs), then evaluating on your target task. A good starting point for lr warm-up is around 10% of total epochs (--model.lr_scheduler_linear_warmup_epochs), while EMA warm-up can be set to around 25% of total epochs (--model.d2v_ema_tau_epochs).