Open realTaki opened 2 years ago
Hi, sure you can see some runs for MSRVTT here: https://app.neptune.ai/m-bain/frozen/experiments?split=tbl&dash=charts&viewId=95e7e8f0-79f1-48a4-9bd5-e1017c21309b
Yeah smaller batch size will take longer to converge -- and intuitively I would think it gives worse performance due to n^2 comparisons.
However, I find for these small datasets that small batch size does really well if you tune the learning rate accordingly, maybe since its like more augmentation. All my best results are with batch size 8-16. I think during pretraining bigger is better just because training is hard to converge. Let me know how you get on :)
For the sake of sharing results, I have reproduced the pre-training on CC3M+WebVid with 1-frame batch size 512 (instead of 96) and 4-frame batch size 128 (instead of 24). On MSR-VTT (1k-A split) zero-shot I got ~2% absolute improvement in R@1, R@5, and R@10. On MSR-VTT fine-tuning (1k-A split) (can't remember the batch size but probably 128), I got +2% in R@1 while R@5 and R@10 where essentially the same.
Can you share some recordings of your experiments like some graphs in neptune.ai or other logs tracking the performance/loss changes in training steps.
I would like to compare the effects of some configurations(e.g. batch size) on training convergence in depth. I think this uses a contrastive loss that depends on a similarity matrix, may be effected by batch size and converges slower in a smaller batch size. In your experiments, it was not using large batch sizes and may not achived the best performance yet. I think I want to try something haha~