I also tried the same training schedule of HRNet codebase with MultiStepLR decay to train TransPose. But the model performances seem to be sensitive to the initial learning rate and the milestones epochs, and some models even could not be trained well to work. I chose this schedule because, with the same initial learning rate, some models perform better than those with MultiStepLR decay. And all the models showed relatively good performances under this same schedule. But, note that this schedule may not be an optimal one, and you also can train with others like the training schedule of DETR.
I also used this schedule to train the SimpleBaseline-Res50 with Darkpose post-processing, it gains 72.1AP on COCO val set (+0.1 improvement). You can see it in Section 4.1 of the updated paper.
Hi, thanks for sharing your study. While I was reading your paper, I wondered why you were using the cosine annealing scheduler.
I'm just asking because this scheduler is unfamiliar to me in human pose estimation domain.