vye16 / slahmr

MIT License
441 stars 50 forks source link

Output Explanation #57

Closed arnavbalaji closed 1 week ago

arnavbalaji commented 3 weeks ago

Hello!

I was wondering if someone could provide me with some details about the outputs from slahmr. I'm currently getting the outputs from the world_results.npz files, and am printing out all the outputs as well as their shapes. Here is what I'm getting.

world_scale: (1, 1)
joints_vel: (1, 1, 22, 3)
trans_vel: (1, 1, 3)
hand_pose: (1, 146, 90)
latent_motion: (1, 145, 48)
floor_plane: (1, 3)
floor_idcs: (1,)
trans: (1, 146, 3)
root_orient: (1, 146, 3)
betas: (1, 16)
latent_pose: (1, 1, 32)
root_orient_vel: (1, 1, 3)
pose_body: (1, 146, 63)
cam_R: (1, 146, 3, 3)
cam_t: (1, 146, 3)
intrins: (4,)
pose_hand: (1, 146, 90)
track_mask: (1, 146)

I'm assuming 146 represents the number of frames in the video, but correct me if I'm wrong. I have a lot of assumptions based on reading the code and the paper of what these mean, but I just wanted to clarify . I am mostly curious of which outputs are positions versus orientations and which form if so.

I'm especially interested in the outputs hand_pose, trans, trans_vel, pose_body, root_orient, root_orient_vel, joints_vel, and latent_motion. Thanks a lot for the help!