yuvalkirstain / PickScore

MIT License
446 stars 26 forks source link

Question about training curves #19

Open vishaal27 opened 9 months ago

vishaal27 commented 9 months ago

Hey, sorry for repeatedly bothering you with questions, hopefully this is one of the last few ones.

I am training PickScore with exactly the same configuration as you provided (grad-accumulation was finally working!). However, while training I notice a weird stair-case-like behaviour in my loss curves:

Screenshot 2024-02-15 at 7 21 38 PM

The sharp dips in loss occur exactly at the beginning of each epoch. I am unsure why this happens, and I have ruled out any wandb logging issues as the main cause. One justification is that the model is beginning to memorise the training samples, perhaps because it is very high capacity (980M params). These threads have similar arguments as to why this might be happening:

I am wondering if you also faced similar issues while training your original PickScore model on Pick-a-pic-v1? Would it be possible for you to share your training loss curves if they are available? I want to ensure that the model is not fully overfitting to the train set, although I did check that the validation accuracy stays stagnant so perhaps it is fine?

Would be great to hear your thoughts on this, thanks again!

yuvalkirstain commented 9 months ago

If I recall correctly (read the paper for full details, I might be wrong), we did not train it for a full epoch. Either way, it is normal for the loss to drop after a full epoch, probably due to memorisation as you mentioned :)