una-dinosauria / 3d-pose-baseline

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.
MIT License
1.42k stars 356 forks source link

Different joints used for input and output for Human3.6m dataset #211

Open reyrobs opened 1 year ago

reyrobs commented 1 year ago

Hi and thank you for your paper and work you have produced, I enjoyed thoroughly going through it. One concern that I have however is that the set of joints used as input to the model differ slightly than the ones used at output by the model. I looked at the dimensions used for the 2D input and 3D output, and got the following results (such that the indices represent the corresponding joint name which can be seen below):

dim to use 2D [ 0  1  2  3  6  7  8 12 13 15 17 18 19 25 26 27]

dim to use 3D [ 1  2  3  6  7  8 12 13 14 15 17 18 19 25 26 27]

H36M_NAMES[0] = 'Hip' H36M_NAMES[1] = 'RHip' H36M_NAMES[2] = 'RKnee' H36M_NAMES[3] = 'RFoot' H36M_NAMES[6] = 'LHip' H36M_NAMES[7] = 'LKnee' H36M_NAMES[8] = 'LFoot' H36M_NAMES[12] = 'Spine' H36M_NAMES[13] = 'Thorax' H36M_NAMES[14] = 'Neck/Nose' H36M_NAMES[15] = 'Head' H36M_NAMES[17] = 'LShoulder' H36M_NAMES[18] = 'LElbow' H36M_NAMES[19] = 'LWrist' H36M_NAMES[25] = 'RShoulder' H36M_NAMES[26] = 'RElbow' H36M_NAMES[27] = 'RWrist'

My intuition was that the same joints would be used for the 2D input and 3D output such that a relationship would be found between each of them by the model, but that does not seem to be the case. I understand that we do a hip centering step for the 3D coordinates which means that all the hip locations in 3D have coordinate (0,0,0), and hence that is not useful information for the model. However, I am not sure why we add the 'Neck/Nose' joint in 3D, while it is not present in the 2D joint locations. Or perhaps I misunderstood something. It would be nice if someone could shed some light on this please. Thanks for reading.

Dipankar1997161 commented 1 year ago

As per the metadata.xml file.

12 is spine1 - thorax 13 is neck and 14 is head

verify this once