stefanopini / simple-HRNet

Multi-person Human Pose Estimation with HRNet in Pytorch
GNU General Public License v3.0
570 stars 106 forks source link

Question :Performance against other 2D detectors (openpose) #6

Closed timtensor closed 4 years ago

timtensor commented 5 years ago

Hello I was playing around with different videos and clips to check performance of how pretrained models work for both Hrnet and Openpose. I seem to notice the open pose seem to have a better accuracy ? Have you tried it , what is your opinion about it ?

bpeck81 commented 5 years ago

One thing I notice is that this implementation of hrnet applies the model to the cropped portion of the person returned from yolo whereas, from what I can tell, the model in the original paper is applied to the entire image. Losing the background context when predicting may affect the performance.

timtensor commented 5 years ago

@bpeck81 thanks for the info , actually when i tried to disable yolov3 detector , it seemed to have worse performance, even for single person detection. I think it also has some limitation on multiperson detections , i am not sure .But with a set of optimized åre trained wieghts , i could only manage detection of 2 persons with descent performance Do you think , by changing the frame rate we can improve the performance ?

stefanopini commented 5 years ago

According to the paper, HRNet should have quite higher performance than OpenPose when trained and tested on COCO. However, Openpose authors claim

In addition, our paper numbers are not based on the current models that have been released. We released our best model at the time but later found a better one.

therefore it may have better performance than HRNet. In my limited experience, performance of the two networks are similar.

@bpeck81 In the HRNet paper, authors state:

This paper is interested in single-person pose estimation

and

We extend the human detection box in height or width to a fixed aspect ratio: height:width = 4:3, and then crop the box from the image, which is resized to a fixed size, 256×192 or 384×288.

and

The two-stage top-down paradigm similar as [47, 11, 72] is used: detect the person instance using a person detector, and then predict detection keypoints.

Therefore, I add a YOLOv3 detector to find person instances and then analyze them with HRNet. With the singleperson option, the person detector is disabled and the image is directly analyzed by HRNet.

timtensor commented 5 years ago

Thank you for answering the queries , As you said for multi person it does not work so good . Perhaps they will release new pretrained weights that would be better in performace. I tired on a multiperson video , i was wondering how could be differentiate , which array corresponds to which person ? For example if there are two person , the output array is of the type (2x17x2) Then i wonder if there is an ID associated to each person , perhaps this is related as another question

stefanopini commented 5 years ago

At the moment, there is not an ID associated to each person because I didn't implement any person tracking functionality. Therefore, the order of the output is equal to the order of yolo detections.

stefanopini commented 4 years ago

@timtensor Could you please check the performance with the latest version of the code? I have implemented the idea proposed in #14 and, from my (limited) tests, accuracy is quite higher now in the multi-person setting.

timtensor commented 4 years ago

@stefanopini i will try to test it in the coming days!

timtensor commented 4 years ago

Yes i notice much better performance and quite stable as well