sanweiliti / RoHM

The official PyTorch code for RoHM: Robust Human Motion Reconstruction via Diffusion.
https://sanweiliti.github.io/ROHM/ROHM.html
Other
308 stars 14 forks source link

Questions about clip_len (145 --> 144 --> 143) ? #13

Closed Xianqi-Zhang closed 3 months ago

Xianqi-Zhang commented 4 months ago

Hi, Thank you for sharing. I have a question about clip_len setting. Why clip_len changes from 145 --> 144 --> 143 ?

So I want to know the reason for this setting. Is it related to Eq.10 & 11 in the paper? But why is it reduced twice?

Thanks for your any reply.

Best regards.

Xianqi-Zhang commented 3 months ago

The murderer is get_repr_smplx() defined in motin_representation.py. get_repr_smplx() calculates velocities related variables with 2 frames, so the dimension of these variables are clip_len - 1. (i.e., the input dimension is clip_len, the output dimension is clip_len-1.)

  1. [clip_len -> clip_len - 1] During generate dataset, DatasetAMASS.create_body_repr() calls get_repr_smplx() to generate repr_dict and get_repr_smplx, which are then used to generate item_dict['motion_repr_clean'] and item_dict['motion_repr_noisy'] for training and test.
  2. [clip_len-1 -> clip_len-2] During inference, the output trajnet/traj_diffusion is used to get traj_rec_full by calling get_repr_smplx() and other related functions. The traj_rec_full is used as pose['cond'] for posenet/pose_diffusion related inference. Since the dimension of traj_rec_full has become clip_len-2, the dimension of variables in posenet_input (batch_pose) have to reduce to clip_len-2, i.e., xxx=xxx[:, 0:-1] in the code.