Closed douyh closed 3 years ago
Only 73.7AP I got. 4 gpus were used for training and I kept the other configs.
From my experience, the performances of transpose-r models are very sensitive to the initial learning rate. I did not train transpose-r-a4 on 4 or 8 GPUs. I suggest you increase the initial learning rate a little bit at such conditions (with larger batchsize).
Please let me know the results if you have tried such experiments.
@yangsenius @douyh
FYI, I have trained TransPose-R-A4 on 4 GPUs. Initial and final learning rate were set to 5e-4
and 5e-5
, respectively. Other configs kept unchanged.
I got 75.3 AP (+0.2 AP compared to README).
Thanks for sharing the results! Happy to see that can bring performance improvement @EckoTan0804
Larger batchsizes with more GPUs empirically bring performance improvement. The learning rate setting of DeiT -- 0.0005 ×batchsize/ constant
may also work well.
I noticed that only 1 GPU is used to train TransPose-R-A4 and lr=0.0001. Should I change the lr if I want to use 4 or 8 gpus? Or just keep the same? Thanks for your reply.