zju3dv / animatable_nerf

Code for "Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos" TPAMI 2024, ICCV 2021
Other
493 stars 50 forks source link

Some question about zju_mocap dataset #48

Closed EAST-J closed 1 year ago

EAST-J commented 1 year ago

Hi, Thanks for your great work, I have some questions about the dataset:

  1. The npy files in the new_vertices folder mean the SMPL mesh in the world coordinates, right? I wonder what unit they use, meter or millimeter?
  2. I wonder if the provided smpl_param(pose parameters) are in the coordinate system defined in SMPL or in the world coordinates? In other words, if I use smplx to get mesh with provided params like: smpl_layer(pose, shape, th) will I get the same result in the vertices folder?
dendenxu commented 1 year ago

Hi! Thanks for the feedback.

  1. The npy files are in the world coordinates. And they come in meter. However there's one caveat about the dataset: the camera extrinsic: T is stored in millimeter in annots.npy. Other coordinates (including camera T stored in extri.yml) in the dataset are all in meter.
  2. The pose parameters are defined in a way that we can separate these three coordinate systems: 1) the world space 2) pose space and 3) tpose space (or bigpose space, canonical space). Different from common SMPL coodinate systems, we define the pose parameter to transform a tpose SMPL to pose space (without any global translation or rotation), whereas in SMPL they store the global rotation parameters as the first row of the pose parameter. We store the global rotation R and translation T separatedly from pose. This is because we use EasyMocap for mocap and what I just described is their convention (or you can say ours since we're from the same lab). So, to use the vanialla smpl_layer(pose, shape, th) formulation, you would need to convert our R to the first row of pose. Or simply transform the result of the vanialla smpl_layer(pose, shape, th) with our R to match the results in the vertices folder.
EAST-J commented 1 year ago

Thank you for your immediate reply. 'you would need to convert our R to the first row of pose', this means I only need to concatente the pose and Rh like: new_pose = np.concatenate((pose, Rh), axis=-1) or should I do something else to change the pose params?

dendenxu commented 1 year ago

R comes in as a 3x3 rotation matrix. So you need to first convert the matrix notation to angle-axis (Rh). Maybe consider cv2.Rodrigues. The first row of our pose is filled with zero. So you need to fill the first row with Rh instead of concatenating: pose[0] = Rh

EAST-J commented 1 year ago

R comes in as a 3x3 rotation matrix. So you need to first convert the matrix notation to angle-axis (Rh). Maybe consider cv2.Rodrigues. The first row of our pose is filled with zero. So you need to fill the first row with Rh instead of concatenating: pose[0] = Rh

Hi, sorry to bother you again. I am a little confused about the appearance code $l_i$. As stated in the paper, the latent code is used to encode the state of the human appearance in frame $i$. Inspired by DeepSDF, this embedding can be optimized during training following the autodecoder. But I wonder how $l_i$ works when inference?

dendenxu commented 1 year ago

For the original ICCV paper, we used the appearance code of the closest pose to the novel pose being rendered, which is sometimes simplified to the appearance code of the last training frame when rendering continuously (i.e. when we train on the first 60 frames and render the 60-120th frames).

In the extended version, we replace the appearance code with pose vector (trivially extensible to novel poses).

EAST-J commented 1 year ago

trivially extensible to novel poses

But when we train on the 60 frames like nn.Embedding(60, 128), and render next 30 frames. How to set the latent index?

dendenxu commented 1 year ago

@EAST-J We only use the pose vector as latent embedding in the extended version. By nn.Embedding I believe you are referring to the original implementation. The use of these latent embedding (of the original implementation) is explained in my previous comment.

Sorry for the confusion about "the original paper" and "the original implementation". This repo contains implementation for both the original paper and the extended version.