zhengyuf / PointAvatar

Official Repository for CVPR 2023 paper PointAvatar: Deformable Point-based Head Avatars from Videos.
Other
318 stars 21 forks source link

How to use PointAvatar dataset on NeRF-based Method #8

Open AndyWangZH opened 1 year ago

AndyWangZH commented 1 year ago

Hello, PointAvatar is really a nice job! Currently, I am following your wonderful job but i am confused about how to use your Dataset on NeRF-based Method (e.g. NerFACE)? Especially, how to set the camera parameters, near and far? Right now I set these parameters by myself according to your provided xml and always get some strange artifacts. Could you please release a detailed description about these? Looking forward to your reply! Thank you very much!

zhengyuf commented 1 year ago

Hi,

It's indeed a bit complicated to transform to NerFace format.

First, we need to put the head transformation into the camera pose. Code for this (with some functions from /code/flame/lbs.py):

def deca_pose_to_nerf_transform(flame, shape, exp, pose, world_mat):

    batch_size = shape.size(0)
    dtype = torch.float32

    # reconstruct shape (needed for joint locations)
    v_template = flame.v_template.unsqueeze(0).expand(batch_size, -1, -1)
    betas = torch.cat([shape, exp], dim=1)

    v_shaped = v_template + blend_shapes(betas, flame.shapedirs)

    # get joints
    # NxJx3 array
    J = vertices2joints(flame.J_regressor, v_shaped)

    rot_mats = batch_rodrigues(
        pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3])

    J_transformed, A = batch_rigid_transform(rot_mats, J, flame.parents, dtype=dtype)

    # and head transformation is:
    head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :]

        # IMavatar's scaling factor of 4
    head_transform[:, :3, 3] *= 4 

    last_row = torch.Tensor([0, 0, 0, 1]).float().unsqueeze(0).unsqueeze(0).expand(batch_size, 1, 4).to(head_transform.device)
    w2c_nerface = torch.matmul(torch.cat([world_mat, last_row], dim=1), head_transform)

    # world_mat = w2c, but nerface expects c2w
    c2w_nerface = torch.inverse(w2c_nerface)

    return c2w_nerface

After this, we normalize the cameras as NerFace's authors suggested: https://github.com/gafniguy/4D-Facial-Avatars/issues/3

After these steps, we could get it to work using the default near far bounding values (0.2, 0.8).

zhangqianhui commented 1 year ago

@zhengyuf hi, yufeng, what about the expression? Do you use flame expression for NerFACE ?

zhengyuf commented 1 year ago

I use FLAME expression + FLAME pose (except global head rotations), concatenated.

zhangqianhui commented 1 year ago

Thanks!

zhangqianhui commented 1 year ago

@zhengyuf hi, yufeng, can you introduce more details about the flame expression + flame pose, the final expression dimension is 50 + 12 or 50 + 9 (jaw, left and right eyes)? Do you use the original pose vector ? Have you converted it into other formats ?

zhangqianhui commented 1 year ago

@zhengyuf I am confused about this line in your function:

# and head transformation is:
head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :]

Why using index 1 which corresponds to the neck transformation ? I observed that the neck is not fixed in the tracked mesh using the deca.

zhengyuf commented 1 year ago

Hi,

So the head pose is 15 dimensions, including global, neck, jaw, left and right eyes. I use 50+12 (neck, jaw, left and right eyes) as NerFace's expression parameters. Index 1 is for selecting the head transformation, which you get after applying neck transformation.

zhangqianhui commented 1 year ago

hi

I still haven't reproduced the same results following your code. I am sorry for asking more questions.

I'm not sure if index 1 (neck) is a transformation relative to the shoulder. If yes, why not using both global transformation (index 0) and neck transformation (index 1) as head transformation. Thus, is this shoulder fixed? I found one variable: GLOBAL_POSE in your optimize.py file, is it true or false in your setting?

zhengyuf commented 1 year ago

No, I believe that the transformations are not relative, but absolute.

07hyx06 commented 1 year ago

Hi,

It's indeed a bit complicated to transform to NerFace format.

First, we need to put the head transformation into the camera pose. Code for this (with some functions from /code/flame/lbs.py):

def deca_pose_to_nerf_transform(flame, shape, exp, pose, world_mat):

  batch_size = shape.size(0)
  dtype = torch.float32

  # reconstruct shape (needed for joint locations)
  v_template = flame.v_template.unsqueeze(0).expand(batch_size, -1, -1)
  betas = torch.cat([shape, exp], dim=1)

  v_shaped = v_template + blend_shapes(betas, flame.shapedirs)

  # get joints
  # NxJx3 array
  J = vertices2joints(flame.J_regressor, v_shaped)

  rot_mats = batch_rodrigues(
      pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3])

  J_transformed, A = batch_rigid_transform(rot_mats, J, flame.parents, dtype=dtype)

  # and head transformation is:
  head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :]

        # IMavatar's scaling factor of 4
  head_transform[:, :3, 3] *= 4 

  last_row = torch.Tensor([0, 0, 0, 1]).float().unsqueeze(0).unsqueeze(0).expand(batch_size, 1, 4).to(head_transform.device)
  w2c_nerface = torch.matmul(torch.cat([world_mat, last_row], dim=1), head_transform)

  # world_mat = w2c, but nerface expects c2w
  c2w_nerface = torch.inverse(w2c_nerface)

  return c2w_nerface

After this, we normalize the cameras as NerFace's authors suggested: gafniguy/4D-Facial-Avatars#3

After these steps, we could get it to work using the default near far bounding values (0.2, 0.8).

@zhengyuf Hi yufeng, what is the meaning of the c2w here? Does it describe the transformation of the FLAME mesh from the FLAME canonical space to the world space? If that's true, in the world space, is there exists a camera located at [0,0,0]? What's the convention of this camera (e.g. in opencv format, x to right, y to down, z to view direction)?

Chuan-10 commented 1 year ago

Hi, It's indeed a bit complicated to transform to NerFace format. First, we need to put the head transformation into the camera pose. Code for this (with some functions from /code/flame/lbs.py):

def deca_pose_to_nerf_transform(flame, shape, exp, pose, world_mat):

    batch_size = shape.size(0)
    dtype = torch.float32

    # reconstruct shape (needed for joint locations)
    v_template = flame.v_template.unsqueeze(0).expand(batch_size, -1, -1)
    betas = torch.cat([shape, exp], dim=1)

    v_shaped = v_template + blend_shapes(betas, flame.shapedirs)

    # get joints
    # NxJx3 array
    J = vertices2joints(flame.J_regressor, v_shaped)

    rot_mats = batch_rodrigues(
        pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3])

    J_transformed, A = batch_rigid_transform(rot_mats, J, flame.parents, dtype=dtype)

    # and head transformation is:
    head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :]

        # IMavatar's scaling factor of 4
    head_transform[:, :3, 3] *= 4 

    last_row = torch.Tensor([0, 0, 0, 1]).float().unsqueeze(0).unsqueeze(0).expand(batch_size, 1, 4).to(head_transform.device)
    w2c_nerface = torch.matmul(torch.cat([world_mat, last_row], dim=1), head_transform)

    # world_mat = w2c, but nerface expects c2w
    c2w_nerface = torch.inverse(w2c_nerface)

    return c2w_nerface

After this, we normalize the cameras as NerFace's authors suggested: gafniguy/4D-Facial-Avatars#3 After these steps, we could get it to work using the default near far bounding values (0.2, 0.8).

@zhengyuf Hi yufeng, what is the meaning of the c2w here? Does it describe the transformation of the FLAME mesh from the FLAME canonical space to the world space? If that's true, in the world space, is there exists a camera located at [0,0,0]? What's the convention of this camera (e.g. in opencv format, x to right, y to down, z to view direction)?

@07hyx06 hi, I wanna ask have you already had answers about your questions? I am confused too.

zhengyuf commented 1 year ago

Hey guys,

With w2c_nerface, we get the world-to-camera transformation for NerFace. Here we've put the head rotation into the camera transformation, i.e., FLAME head rotation is 0 now, which is what NerFace expects.

Since Nerface needs camera-to-world matrix, we also inverse w2c_nerface to get c2w_nerface.

Yufeng

Chuan-10 commented 1 year ago

Hi Yufeng, Thank you for your apply! I am trying to running NerFACE with IMAvatar's data, now i am stuck at the transformation of the matrix.

I ran IMAvatar's preprocessing scripts on the NerFACE's data person1's video, so I got the DECA matrixs of person1. Like you said, I frist use functuion deca_pose_to_nerf_transform to get the c2w_nerface matrix. Than, I normalize the cameras. For example , the DECA matrix of the 1.png of train set is:

"world_mat": 
[1.0, 0.0, 0.0, -0.03322279825806618], 
[0.0, 1.0, 0.0, -0.029262954369187355], 
[0.0, 0.0, 1.0, -3.589096784591675].

After functuion deca_pose_to_nerf_transform I got:

[0.996785044670105, -0.016208326444029808, -0.07846513390541077, -0.3470061123371124], 
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.03734137490391731], 
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 3.5924389362335205], 
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-09, 1.0].

And after normalization I got:

[0.996785044670105, -0.016208326444029808,-0.07846513390541077, -0.054800982790935565], 
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.005897141205142365],
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 0.5673363589940569], 
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-9, 1 ].

It is totally different from the matrix provided by NerFACE person1 dataset:

[0.975096, -0.0213444, -0.220754,-0.12505983864469886],
[0.0591482, 0.984335, 0.166091, 0.09199261821349274],
[ 0.213751, -0.175011, 0.961083, 0.5170857417970655],
[-0.0, 0.0, 0.0, 1.0].

Could you please help me figure this out? Thank you for your time and kindness!

JInChuan

JasonW-00 commented 1 year ago

Hi Yufeng, Thank you for your apply! I am trying to running NerFACE with IMAvatar's data, now i am stuck at the transformation of the matrix.

I ran IMAvatar's preprocessing scripts on the NerFACE's data person1's video, so I got the DECA matrixs of person1. Like you said, I frist use functuion deca_pose_to_nerf_transform to get the c2w_nerface matrix. Than, I normalize the cameras. For example , the DECA matrix of the 1.png of train set is:

"world_mat": 
[1.0, 0.0, 0.0, -0.03322279825806618], 
[0.0, 1.0, 0.0, -0.029262954369187355], 
[0.0, 0.0, 1.0, -3.589096784591675].

After functuion deca_pose_to_nerf_transform I got:

[0.996785044670105, -0.016208326444029808, -0.07846513390541077, -0.3470061123371124], 
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.03734137490391731], 
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 3.5924389362335205], 
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-09, 1.0].

And after normalization I got:

[0.996785044670105, -0.016208326444029808,-0.07846513390541077, -0.054800982790935565], 
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.005897141205142365],
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 0.5673363589940569], 
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-9, 1 ].

It is totally different from the matrix provided by NerFACE person1 dataset:

[0.975096, -0.0213444, -0.220754,-0.12505983864469886],
[0.0591482, 0.984335, 0.166091, 0.09199261821349274],
[ 0.213751, -0.175011, 0.961083, 0.5170857417970655],
[-0.0, 0.0, 0.0, 1.0].

Could you please help me figure this out? Thank you for your time and kindness!

JInChuan

Hi Yufeng, Thank you for your apply! I am trying to running NerFACE with IMAvatar's data, now i am stuck at the transformation of the matrix.

I ran IMAvatar's preprocessing scripts on the NerFACE's data person1's video, so I got the DECA matrixs of person1. Like you said, I frist use functuion deca_pose_to_nerf_transform to get the c2w_nerface matrix. Than, I normalize the cameras. For example , the DECA matrix of the 1.png of train set is:

"world_mat": 
[1.0, 0.0, 0.0, -0.03322279825806618], 
[0.0, 1.0, 0.0, -0.029262954369187355], 
[0.0, 0.0, 1.0, -3.589096784591675].

After functuion deca_pose_to_nerf_transform I got:

[0.996785044670105, -0.016208326444029808, -0.07846513390541077, -0.3470061123371124], 
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.03734137490391731], 
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 3.5924389362335205], 
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-09, 1.0].

And after normalization I got:

[0.996785044670105, -0.016208326444029808,-0.07846513390541077, -0.054800982790935565], 
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.005897141205142365],
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 0.5673363589940569], 
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-9, 1 ].

It is totally different from the matrix provided by NerFACE person1 dataset:

[0.975096, -0.0213444, -0.220754,-0.12505983864469886],
[0.0591482, 0.984335, 0.166091, 0.09199261821349274],
[ 0.213751, -0.175011, 0.961083, 0.5170857417970655],
[-0.0, 0.0, 0.0, 1.0].

Could you please help me figure this out? Thank you for your time and kindness!

JInChuan

After passing the Nerface data through the IMavatar preprocessing pipeline, and then run it on the PointAvatar pipeline, you have encountered this problem at this time" /home/pytorch3d/pytorch3d/transforms/transform3d.py:800: UserWarning: R is not a valid rotation matrix warnings. warn(msg)"?

zhengyuf commented 1 year ago

In the normalization step, we used

all_rigids[:,:,0] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
all_rigids[:,:,1] *= -1 # instead of all_rigids[:,:,2] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.

and if it helps at all, here is what we get when we convert IMavatar format to NerFace format for yufeng.zip: https://drive.google.com/file/d/1qLOJpSzuEvYImZIT4O1cBUNkz-ziKi2E/view?usp=sharing

zydmu123 commented 1 year ago

Hello, Yufeng Sorry to bother you, I tested above transforms_train.json of NerFace format you provided by using render_debug_camera_matrix() in NeRFACE, but the scale of overlay seems to be much smaller than normal condition, even if I enlarge the scale to 0.5, the results still do not match very well, did I miss something important? Thanks a lot!

im

zhengyuf commented 1 year ago

Hi,

Could you train to train with this configuration? NerFace do some scale adjustment and I think it's normal that the scale doesn't match. The method also doesn't require alignment since no supervision from 3DMM is needed.

Nvatarer commented 11 months ago

Hi, YuFeng Thanks for your great open-source work! I use your above data configuration to train the NeRFace, but it seems failed, I can't get the normal results. Is there any other special adjustment have you made to the NeRFace code? Thanks a lot!

YingjunShang commented 4 months ago

@zhengyuf Hello! I 'm replicating the PointAvatar code, following the steps outlined in README.md. I wanted to ask why I'm encountering this error:

Traceback (most recent call last): File "/home/xietong/PointAvatar/code/scripts/exp_runner.py", line 35, in <module> runner.run() File "/home/xietong/PointAvatar/code/../code/scripts/train.py", line 325, in run model_outputs = self.model(model_input) File "/home/xietong/miniconda3/envs/point-avatar/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/xietong/PointAvatar/code/../code/model/point_avatar_model.py", line 182, in forward images = self._render(transformed_point_cloud, cameras) File "/home/xietong/PointAvatar/code/../code/model/point_avatar_model.py", line 102, in _render images, weights = self.compositor( ValueError: too many values to unpack (expected 2)

The self.compositor is the custom AlphaCompositor function in your pytorch3d library:

`class AlphaCompositor(nn.Module):
    def __init__(
        self, background_color: Optional[Union[Tuple, List, torch.Tensor]] = None
    ) -> None:
        super().__init__()
        self.background_color = background_color

    def forward(self, fragments, alphas, ptclds, **kwargs) -> torch.Tensor:
        background_color = kwargs.get("background_color", self.background_color)
        images = alpha_composite(fragments, alphas, ptclds)

        # images are of shape (N, C, H, W)
        # check for background color & feature size C (C=4 indicates rgba)
        if background_color is not None:
            return _add_background_color_to_images(fragments, images, background_color)
        return images`

And I tried to print the compositor, and it is:

` tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         ...,`
zhengyuf commented 4 months ago

Try checkout to the point-avatar branch in the customized pytorch3d repo, and re-install. Also, for new questions, please start a new issue.

YingjunShang commented 4 months ago

Ok, thank you very much! Have a nice day!