Some more details about the pretrained models

heydarshahi commented 7 months ago

Following #5, I noticed my training times are very different from yours. E.g., each epoch takes ~8 hours on the full data with twice your batch size (128) and the default parameters on an A100 GPU.

Noticing that, my only guess is that the pretrained models are trained on cropped images. Is that correct?
Also, I see that the training loss stays at ~60 for around 15k steps and then suddenly drops down to ~2. Do you possibly remember seeing the same behaviour? I tried with different learning rates and the behavior is more or less the same.

Thanks! Amin

zc-alexfan commented 7 months ago

Hi,

The training was on the cropped images which is a lot faster. The bottleneck is the hard drive speed so it will be much faster if you have an sdd.

For losses, yes i also observed similar issues, it is mainly because of the camera parameters are quite difficult to estimate.

zc-alexfan commented 7 months ago

I suggest to take a look at the joint transformer in our latest paper: https://arxiv.org/abs/2403.16428

It has sota performance and it has code release.

heydarshahi commented 7 months ago

Thanks a lot for the quick response!

zc-alexfan / arctic

Some more details about the pretrained models #41