Only 1 GPU is used for training

yangsenius / TransPose

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf

MIT License

358 stars 57 forks source link

Only 1 GPU is used for training #11

Closed douyh closed 3 years ago

douyh commented 3 years ago

I noticed that only 1 GPU is used to train TransPose-R-A4 and lr=0.0001. Should I change the lr if I want to use 4 or 8 gpus? Or just keep the same? Thanks for your reply.

douyh commented 3 years ago

Only 73.7AP I got. 4 gpus were used for training and I kept the other configs.

yangsenius commented 3 years ago

From my experience, the performances of transpose-r models are very sensitive to the initial learning rate. I did not train transpose-r-a4 on 4 or 8 GPUs. I suggest you increase the initial learning rate a little bit at such conditions (with larger batchsize).

yangsenius commented 3 years ago

Please let me know the results if you have tried such experiments.

EckoTan0804 commented 3 years ago

@yangsenius @douyh FYI, I have trained TransPose-R-A4 on 4 GPUs. Initial and final learning rate were set to 5e-4 and 5e-5, respectively. Other configs kept unchanged. I got 75.3 AP (+0.2 AP compared to README).

截屏2021-05-19 10 45 46

yangsenius commented 3 years ago

Thanks for sharing the results! Happy to see that can bring performance improvement @EckoTan0804

Larger batchsizes with more GPUs empirically bring performance improvement. The learning rate setting of DeiT -- 0.0005 ×batchsize/ constant may also work well.