Hrnet slow compared to simple baseline

leoxiaobin / deep-high-resolution-net.pytorch

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

https://jingdongwang2017.github.io/Projects/HRNet/PoseEstimation.html

MIT License

4.31k stars 908 forks source link

Hrnet slow compared to simple baseline #26

Open ghost opened 5 years ago

ghost commented 5 years ago

Hello,

Thank you very much for the implementation and the trained models.

I compared pose estimation run time of both pose_hrnet_w32_256x192.pth and pose_resnet_50_256x192.pth on the same large data-set I have (~5700 images with ~2 people in an image). In both runs I start measuring after 100 iterations to account for GPU warm-up. I measure run time without the detection part, yet including pre-proccessing of cropping and resizing the detection crops, which is done in the same way for both networks.

I get ~101 fps for pose_resnet_50_256x192, compared to ~82 fps for pose_hrnet_w32_256x192 on a single 2080ti gpu.

Could it be that hrnet is slower than simple baseline or am I missing something?

YinRui1991 commented 5 years ago

@conference-anonymous Hi, I think your speeds on pose_resnet_50_256x192 model and pose_hrnet_w32_256x192 model are right. I test COCO val 2017 with this two models and the time cost is 550s(resnet) and 1200s(hrnet). Though pose_hrnet_w32_256x192 need smaller GFLOPs than pose_resnet_50_256x192 reported in the paper.

sunke123 commented 5 years ago

@conference-anonymous @YinRui1991 Hi all, thanks for your attention. In our paper, we want to point out that our method, with fewer #parameters and GFLOPs, is superior to the previous works, not relying on larger model size. But, GFLOPs doesn't correspond to the runtime because of different implementation.

In Pytorch implementation, the convolutional layers are executed in series, even though we connect the different branches in parallel. So, the speed of HRNet is slower than SimpleBaseline. Training and inference speed for our HRNet could be improved if Pytorch supports the parallel convolutions.

H19012 commented 4 years ago

Hello,

Thank you very much for the implementation and the trained models.

I compared pose estimation run time of both pose_hrnet_w32_256x192.pth and pose_resnet_50_256x192.pth on the same large data-set I have (~5700 images with ~2 people in an image). In both runs I start measuring after 100 iterations to account for GPU warm-up. I measure run time without the detection part, yet including pre-proccessing of cropping and resizing the detection crops, which is done in the same way for both networks.

I get ~101 fps for pose_resnet_50_256x192, compared to ~82 fps for pose_hrnet_w32_256x192 on a single 2080ti gpu.

Could it be that hrnet is slower than simple baseline or am I missing something?

Is the 82 fps with the heatmap to coord post processing?