otaheri / GrabNet

GrabNet: A Generative model to generate realistic 3D hands grasping unseen objects (ECCV2020)
https://grab.is.tue.mpg.de
Other
238 stars 30 forks source link

Difference / correspondence between GRAB & GrabNet Datasets #12

Closed purvaten closed 3 years ago

purvaten commented 3 years ago

Hi, I had some questions regarding the differences between the data for GRAB & GrabNet. I understand that some additional pre-processing was performed on GRAB data to obtain the data for GrabNet. However, the change in variable names and their usage is slightly confusing to me.

Q1. Right-hand parameters

For the right hand sequence data in GRAB, there are 4 parameters with the following dimensions:

  • 'global_orient' (3,)
  • 'hand_pose' (24,)
  • 'transl' (3,)
  • 'fullpose' (45,)

For GrabNet on the other hand, these are 3 main parameters with following dimensions:

  • 'global_orient_rhand_rotmat' (1, 3, 3)
  • 'fpose_rhand_rotmat' (15, 3, 3)
  • 'trans_rhand' (3,)

Could you please tell me how these two sets of parameters are related and what exactly they mean? I see that in GRAB/grab/grab_preprocessing.py you are using the first set of parameters to pass to the right-hand model to obtain right hand vertices. Can right hand vertices be obtained from the second set of parameters as well?

Q2. Object parameters

Similarly, for object parameters:

  • For GRAB, we have: 'transl' (3,) & 'global_orient' (3,).
  • For GrabNet: 'trans_obj' (3,) & 'root_orient_obj_rotmat' (3, 3)

Could you please explain the difference between global_orient and a rotation matrix root_orient_obj_rotmat? Also, the translation parameters don't appear to be aligned. For instance, for GrabNet, all trans_obj values appear to be zeros which is not the case in GRAB for transl.

Q3. Folder naming.

I noticed a discrepancy for "lift" in GRAB dataset, which I believe is being referred to as "pick_all" in GrabNet. Just wanted to confirm with you whether these correspond to each other?

Thanks!

otaheri commented 3 years ago

Hi @purvaten, sorry for these differences and confusion. I didn't notice them as I was treating the datasets separately. I will try to answer your questions below.

The right-hand model (MANO) takes 3 main parameters as input, namely: global_orient, hand_pose, and transl. For the joint rotations (global_orient and hand_pose), there are different representations to use. The default representation for the MANO loader uses axis-angle representation which has (N,3) dimension; compared to the rotation matrix which has (N,3,3). So, the right-hand pose has (1,45) dimensions in the axis-angle representation (fullpose) and (15,3,3) in the rotation matrix representation (fpose_rhand_rotmat). The same happens for the global_orient_rhand_rotmat and global_orient. In addition, in order to further reduce the dimensinality, we apply we map the 45-dimensional representation to a lower dimension (e.g. 24) using PCA. So, hand_pose with the dimension (1,24) represents the same rotations as fpose_rhand_rotmat but in the PCA space. For more details about the PCA space please refer to the original MANO paper. you can use any of these parameters to get the hand vertices by converting them to each other.

I hope these help to clarify some issues. Feel free to ask more questions if these don't answer the previous ones.

purvaten commented 3 years ago

Hi Omid, thank you so much for such a prompt and detailed reply.

I think all this is clear to me now. Closing this issue.

Congratulations on the great work btw - its super exciting to play around with the data and code!