mks0601 / 3DMPPE_ROOTNET_RELEASE

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019
MIT License
477 stars 65 forks source link

Root Result with BBox Issue. #23

Closed YangJae96 closed 4 years ago

YangJae96 commented 4 years ago

Hello.

Thank you for your great work.

I tried the demo.py in the RootNet with your "input.jpg". You gave us the bbox values like this bbox_list = [ [139.41, 102.25, 222.39, 241.57],\ [287.17, 61.52, 74.88, 165.61],\ [540.04, 48.81, 99.96, 223.36],\ [372.58, 170.84, 266.63, 217.19],\ [0.5, 43.74, 90.1, 220.09] ] and this seems to get good Root results.

But there is a problem when I use my bbox results that I obtained from Detectron2. bbox_list = [[367.4241, 177.2487, 636.8704, 389.0179], [169.7263, 103.9277, 365.6891, 341.9860], [ 1.0619, 43.5408, 88.0144, 266.1570], [537.6930, 51.6290, 639.7495, 265.5615], [292.3224, 59.8680, 364.7124, 224.1436]] output

output_root_2d_4 output_root_2d_1

The results of the RootNet is bad. I have also tested with other images like 2 people in image but I keep getting bad results although I have detected the humans. I have used all the pretrained Detectron Mask RCNN and Faster RCNN Models. and I also used the pretrained model of the RootNet "snapshot18.pth" that you provided.

Is The bbox you provided a Ground truth?? I can't figure out what the issue is. Can I ask you some advice please?

Thank you.

mks0601 commented 4 years ago

Hi,

I checked the bbox you gave, however, it seems they are wrong. Below images are just cropped images from input.jpg with your bbox. As you can see some images are not properly cropped. 0 1 2 3 4

And below images are RootNet results. output_root_2d_0 output_root_2d_1 output_root_2d_2 output_root_2d_3 output_root_2d_4 As you can see, if the bbox is correct, it produces good results.

Similar experimental results in my paper (Table 2) show that human detection performance affect marginally on human root joint localization and 3D root-relative pose estimation.

YangJae96 commented 4 years ago

Thank you for your quick reply!! The problem was the bbox coordinates I used from detectron2 mask-rcnn was (xmin,ymin, xmax,ymax)... lefttop coord and right bottom coord. which didn't match with your bbox coord (xmin,ymin,width,height)

Thank you again for great work!

mks0601 commented 4 years ago

Thanks!

vigneshrk29 commented 1 year ago

Hi,

Sorry to reopen this issue but would width and height be equal to xmax-xmin and ymax-ymin?

Thanks

mks0601 commented 1 year ago

which width and height do you mean?

vigneshrk29 commented 1 year ago

Thanks for the quick reply. In the bbox values, they mentioned that instead of [xmin, ymin, xmax, ymax] you are meant to put [xmin, ymin, width, height]. I wanted to know if width and height was the difference between max and min values?

vigneshrk29 commented 1 year ago

I ran the pose net and rootnet using a image from neither of the datatsets. I got bbox values from yolov5. This is the ouput I get: root 3dPose

The root looks good but the 3dpose isnt great. So not sure where I am going wrong

mks0601 commented 1 year ago

You're correct. The height and width represent xmax-xmin and ymax-ymin, respectively. Although that pose seems easy, contact between different body part like contact between hand and body of that image is one of the biggest challenge in 3D human pose community because of the depth ambiguity. As you might note, 2D-projected 3D pose seems reasonable.

vigneshrk29 commented 1 year ago

Yes, the 2d-projected pose is reasonble. Thank you. Brilliant work btw