yz93 / LAVT-RIS

GNU General Public License v3.0
185 stars 14 forks source link

Why does training take so long? #22

Closed Huntersxsx closed 1 year ago

Huntersxsx commented 1 year ago

Hello, I use 8 2080ti GPUs and select swin transformer-tiny as backbone to train the net. It takes me almost 60 hours to complete 40 training epochs. I wonder if this is normal? If it is normal, what's the main reason for the slow training speed? I try to discard the PWAM, but it is still slow. I think 60 hours is a long time for one-time training. Looking forward to your reply.

yz93 commented 1 year ago

Hi,

I think 60 hours on 2080Ti are not abnormal per se.

Maybe check the GPU utility. If it is low, then I recommend finding whether it is caused by the infrastructure or by the code. Then improve it!