Questions about clip_len (145 --> 144 --> 143) ?

sanweiliti / RoHM

The official PyTorch code for RoHM: Robust Human Motion Reconstruction via Diffusion.

Other

308 stars 14 forks source link

Hi, Thank you for sharing. I have a question about clip_len setting. Why clip_len changes from 145 --> 144 --> 143 ?

First, clip_len is defined as 145 in cfg_files/test_cfg/amass_occ_0.1_noise_3.yaml. https://github.com/sanweiliti/RoHM/blob/6afeb566bc66626cd3b5f7fdcba14b8fe13f7c6f/cfg_files/test_cfg/amass_occ_0.1_noise_3.yaml#L12
Then, in dataloader_amass.py, it becomes 144, since feet_l and feet_r use positions from [1:, ] and [:-1, ], and values in data_dict are [: -1]. So clip_len 145 --> 144. https://github.com/sanweiliti/RoHM/blob/6afeb566bc66626cd3b5f7fdcba14b8fe13f7c6f/data_loaders/dataloader_amass.py#L209 https://github.com/sanweiliti/RoHM/blob/6afeb566bc66626cd3b5f7fdcba14b8fe13f7c6f/data_loaders/motion_representation.py#L267-L282 https://github.com/sanweiliti/RoHM/blob/6afeb566bc66626cd3b5f7fdcba14b8fe13f7c6f/data_loaders/motion_representation.py#L23-L44
After dataset generation, in test_amass_full.py, clip_len becomes 143 because batch_pose use [:, :-1]. So the finaly generated sequence length is 143 and not as defined in config file. https://github.com/sanweiliti/RoHM/blob/6afeb566bc66626cd3b5f7fdcba14b8fe13f7c6f/test_amass_full.py#L314-L316

So I want to know the reason for this setting. Is it related to Eq.10 & 11 in the paper? But why is it reduced twice?

Thanks for your any reply.

Best regards.

The murderer is get_repr_smplx() defined in motin_representation.py. get_repr_smplx() calculates velocities related variables with 2 frames, so the dimension of these variables are clip_len - 1. (i.e., the input dimension is clip_len, the output dimension is clip_len-1.)

[clip_len -> clip_len - 1] During generate dataset, DatasetAMASS.create_body_repr() calls get_repr_smplx() to generate repr_dict and get_repr_smplx, which are then used to generate item_dict['motion_repr_clean'] and item_dict['motion_repr_noisy'] for training and test.
[clip_len-1 -> clip_len-2] During inference, the output trajnet/traj_diffusion is used to get traj_rec_full by calling get_repr_smplx() and other related functions. The traj_rec_full is used as pose['cond'] for posenet/pose_diffusion related inference. Since the dimension of traj_rec_full has become clip_len-2, the dimension of variables in posenet_input (batch_pose) have to reduce to clip_len-2, i.e., xxx=xxx[:, 0:-1] in the code.

sanweiliti / RoHM

Questions about clip_len (145 --> 144 --> 143) ? #13