mks0601 / I2L-MeshNet_RELEASE

Official PyTorch implementation of "I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image", ECCV 2020
MIT License
710 stars 130 forks source link

Confused about better MPJPE on stage lixel versus stage param #79

Open kyang-06 opened 2 years ago

kyang-06 commented 2 years ago

Hi, thank you for quite great work and consistent contribution on 3D human&hand! I am confused about the better MPJPE on lixel (55.83mm) vs. param (66.05mm)

>>> Using GPU: 4,5,6,7
Stage: param
08-10 00:25:56 Creating dataset...
creating index...
index created!
Get bounding box and root from ../data/Human36M/rootnet_output/bbox_root_human36m_output.json
08-10 00:26:16 Load checkpoint from ../output/model_dump/snapshot_17.pth.tar
08-10 00:26:16 Creating graph...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:46<00:00,  1.09it/s]
MPJPE from lixel mesh: 55.83 mm
PA MPJPE from lixel mesh: 41.10 mm
MPJPE from param mesh: 66.05 mm
PA MPJPE from param mesh: 45.03 mm
  1. Why stage 1 gets better than stage 2?
  2. Does that mean stage 2 is unnecessary for better MPJPE/PA-MPJPE? Because as I see in the code below, stage=param cut off the gradient of lixel backbone https://github.com/mks0601/I2L-MeshNet_RELEASE/blob/754e0201e494dc891b94949098cc93eec0e37ee8/main/model.py#L54
mks0601 commented 2 years ago

Hi,

  1. Because lixel represetation (output of the 1st stage) is easier to predict than SMPL parameters (output of the 2nd stage).
  2. Can't say always true, but might be true (according to my experimental results).
kyang-06 commented 2 years ago

Thank you for quick reply! It's very kind of you last two more question:

  1. MPJPE of human3.6m is conducted on 14 joints following HMR&SPIN or on 17 joints?
  2. So assume an extreme case: if I only expect best MPJPE, I just have to train lixel part, right? (I know it's meanningless for body reconstruction, just raise a case for understanding)