Closed leeyegy closed 3 years ago
Sorry, @leeyegy, we didn't train TransPose models on 384*288 input resolution.
Because the computational complexity of the self-attention layer is quadratic to the input sequences (HxW), training with 384*288 input resolution will consume much more GPU memory (~5 times than 256x192 input resolution). We may conduct experiments in different resolutions and report results in the future.
In your paper, only results of 256*192 images are presented. And TransPose-H-A6 performs better than HRNET-32+DRAKPOSE on COCO validation set according to your paper (75.8AP VS 75.6AP).
Hence, I am curious about —— does it still perform better than HRNET if increase the resolutoin?(like 384*288)
it would be appretiated if you can share the experiment results in different resolution.