microsoft / satclip

PyTorch implementation of SatCLIP
MIT License
190 stars 19 forks source link

Longer training time than expected #15

Open PlekhanovaElena opened 1 month ago

PlekhanovaElena commented 1 month ago

Hi there,

I'm trying to reproduce the pre-training of the SatClip based on S100 datset. In the default.yaml, I changed the following:

I'm also using single A100 GPU, 11 cores and up to 256GB RAM.

The problem I'm facing is that one epoch takes really long time (probably for loading all the images). My data is stored on a SSD with a decent connection to the A100 tower. The time is approximately 36min per epoch which is 6 times more than what is indicated in the paper (i.e. 2 days for 500 epochs on a single A100 GPU). Do you know what might be the problem? May I ask which parameters and machines you used for training with moco_resnet50?

Kind regards, Elena

konstantinklemmer commented 1 month ago

Yes, dataloading is a bottleneck in SatCLIP training. Some general advice: