leoxiaobin / deep-high-resolution-net.pytorch

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
https://jingdongwang2017.github.io/Projects/HRNet/PoseEstimation.html
MIT License
4.32k stars 913 forks source link

How to get person detection? #41

Open FrancescoPiemontese opened 5 years ago

FrancescoPiemontese commented 5 years ago

First of all thank you for your excellent work. I have a question regarding person detection. In your paper it is mentioned that you use a person detector before feeding its output to the HRNet. Am I supposed download this separately and then feed its output to the HRNet? If so, what do the dataloaders in train and test.py do? Would it be possible for you to tell me which person detector has been used?

njustczr commented 5 years ago

I think the author used the detection information from the dataset.(mpii dataset json file 'center' 'scale')

lxy5513 commented 5 years ago

@FrancescoPiemontese maybe you can refer to this myhrnet, I integrated yolo human detection.

FrancescoPiemontese commented 5 years ago

Thank you! I will try

leoxiaobin commented 5 years ago

@lxy5513 , will you consider make a PR to this repo?

lxy5513 commented 5 years ago

@leoxiaobin yes, soon after, I will add several human detection, like R-FCN, RetineNet, then do PR and speed description.

lxy5513 commented 5 years ago

@leoxiaobin I prepare to do this track by your simple-baseline paper description

For the processing frame in videos, the boxes from a human detector and boxes generated by propagating joints from previous frames using optical flow are unified using a bounding box Non-Maximum Suppression (NMS) operation

I have two group boxes, but I don't how to do NMS, because boxes generated by flownet2S, which have no confidence scoces, could I can default think the score is previous frame boxes scors ? Could you tell me the problem, thank you advance.

leoxiaobin commented 5 years ago

We actually use the OKS score for NMS.

lxy5513 commented 5 years ago

Thanks

lxy5513 commented 5 years ago

@leoxiaobin Hi, I made a PR for yolov3-HRnet, however something wired. I use two ways.


ONE, I get dt_boxes from yolo then python tools/test.py TEST.USE_GT_BBOX False TEST.FLIP_TEST False, and get rid of oks nms, get result as follow:

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet 0.702 0.859 0.770 0.653 0.779 0.736 0.878 0.794 0.683 0.813

TWO, I use end-to-end two model(same model like ONE), and get keypoins, then save into json, finally I get result by official cocoEval.evaluate() , as follow :

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
hrnet 0.594 0.811 0.656 0.564 0.651 0.647 0.834 0.704 0.601 0.713

Could you please tell why the two results is so different, Thank you in advance

lxy5513 commented 5 years ago

this is my script https://github.com/lxy5513/hrnet/blob/master/tools/eval.py which get keypoints json file , by the way, my YOLOv3 threshold is 0.1

leoxiaobin commented 5 years ago

I have a very quick look through your code. I have two questions.

  1. It seems that you do not convert image's channel to RGB. Opencv reads image as BGR channel. Our model are trained using RGB channel. So you need first convert your image data to RGB channel like line131 at https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/blob/master/lib/dataset/JointsDataset.py#L131.

  2. Are the threshold for both metheds same?

lxy5513 commented 5 years ago

I am greatly appreciate for your attention this is my convert channel code: https://github.com/lxy5513/hrnet/blob/master/tools/eval.py#L142 .

this is my relative threshold code, they are same for two methods. https://github.com/lxy5513/hrnet/blob/master/tools/eval.py#L159

lxy5513 commented 5 years ago

By the way, my use yolov3 + simple-baseline pose model, test the PR, it seem normal, as follow:

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
Simple-baseline 0.648 0.856 0.708 0.617 0.706 0.697 0.880 0.750 0.652 0.763
alex9311 commented 4 years ago

I would say this issue can be closed with #161 being merged

zhanghao5201 commented 3 years ago

@leoxiaobin Hi, I made a PR for yolov3-HRnet, however something wired. I use two ways.

ONE, I get dt_boxes from yolo then python tools/test.py TEST.USE_GT_BBOX False TEST.FLIP_TEST False, and get rid of oks nms, get result as follow:

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L) pose_hrnet 0.702 0.859 0.770 0.653 0.779 0.736 0.878 0.794 0.683 0.813 TWO, I use end-to-end two model(same model like ONE), and get keypoins, then save into json, finally I get result by official cocoEval.evaluate() , as follow :

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L) hrnet 0.594 0.811 0.656 0.564 0.651 0.647 0.834 0.704 0.601 0.713 Could you please tell why the two results is so different, Thank you in advance

image i also get 0.702,but the implementation about w32_256*192 is 0.744,why?i just run the implementaion code with the trained model pose_hrnet_w32_256x192.pth.can you help me?