mkocabas / VIBE

Official implementation of CVPR2020 paper "VIBE: Video Inference for Human Body Pose and Shape Estimation"
https://arxiv.org/abs/1912.05656
Other
2.85k stars 551 forks source link

questions about preprocessing "insta variety dataset" #116

Closed bragilee closed 3 years ago

bragilee commented 3 years ago

Hi authors,

Thanks for your great work and codes. Actually I have few questions about preprocessing "insta variety dataset". https://github.com/mkocabas/VIBE/blob/a859e45a907379aa2fba65a7b620b4a2d65dcf1b/lib/data_utils/insta_utils.py#L249

  1. I did not see any parts related to extracting and storing "bbox", did I miss something in other parts in the repo? dataset = { 'vid_name': [], 'frame_id': [], 'joints2D': [], # should contain openpose keypoints only 'features': [], }
  2. How to deal with "scale_factor" provided in "insta" dataset. Here the scales are defined as 1 number, but in the official documentations, it seems the "scale_factors" are two numbers, I am a bit confused the code here. https://github.com/mkocabas/VIBE/blob/a859e45a907379aa2fba65a7b620b4a2d65dcf1b/lib/data_utils/insta_utils.py#L119

Thanks and look forward to your reply. :)

bragilee commented 3 years ago

Dear authors,

Just to add here, one another question is, did you have the problem that "Insta variety" dataset is kind of noisy. I mean I tried to visualize "images" you used as the input to the VIBE model and visualize keypoints(2d) over the image, I found: some keypoints are off, some keypoints are wrong (for example, keypoints are annotated on people who are just standing besides), etc. I did filtering to remove those cases but I am wondering did you have those problems and do you have any suggestions?

Thanks!

ikvision commented 3 years ago

I have found those id correspondence problem in COCO when training SPIN https://github.com/nkolot/SPIN/issues/43. Insta variety is weakly supervised, not manually annotated, therefore any approach to solve it automatically can only reduce those mistakes and not eliminate them. https://github.com/Arthur151/CenterHMR (code to be uploaded soon) presented in workshop https://virtualhumans.mpi-inf.mpg.de/3DPW_Challenge/ tried to improve the matching problem by selecting the person in the center of the bounding box

bragilee commented 3 years ago

@ikvision Thanks for your reply. Actually I did not inspect too much in correspondence, I thought this is fixed during the preprocessing. I think it might be a common problem when we use detector like openpose. For Insta dataset specifically, VIBE used the cropped images provided by the https://github.com/akanazawa/human_dynamics directly as the input to get features, where those mismatches might come from. Thanks for your repo and I am checking details how it handles matching problems.

mkocabas commented 3 years ago

Hi @bragilee,

We don't use the bboxes for InstaVariety since the provided tfrecords files contain already cropped images. And, yes InstaVariety is quite noisy since they use a 2D joint detector (openpose) to collect pseudo-groundtruth labels.