Open AndyWangZH opened 1 year ago
Hi,
It's indeed a bit complicated to transform to NerFace format.
First, we need to put the head transformation into the camera pose. Code for this (with some functions from /code/flame/lbs.py
):
def deca_pose_to_nerf_transform(flame, shape, exp, pose, world_mat):
batch_size = shape.size(0)
dtype = torch.float32
# reconstruct shape (needed for joint locations)
v_template = flame.v_template.unsqueeze(0).expand(batch_size, -1, -1)
betas = torch.cat([shape, exp], dim=1)
v_shaped = v_template + blend_shapes(betas, flame.shapedirs)
# get joints
# NxJx3 array
J = vertices2joints(flame.J_regressor, v_shaped)
rot_mats = batch_rodrigues(
pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3])
J_transformed, A = batch_rigid_transform(rot_mats, J, flame.parents, dtype=dtype)
# and head transformation is:
head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :]
# IMavatar's scaling factor of 4
head_transform[:, :3, 3] *= 4
last_row = torch.Tensor([0, 0, 0, 1]).float().unsqueeze(0).unsqueeze(0).expand(batch_size, 1, 4).to(head_transform.device)
w2c_nerface = torch.matmul(torch.cat([world_mat, last_row], dim=1), head_transform)
# world_mat = w2c, but nerface expects c2w
c2w_nerface = torch.inverse(w2c_nerface)
return c2w_nerface
After this, we normalize the cameras as NerFace's authors suggested: https://github.com/gafniguy/4D-Facial-Avatars/issues/3
After these steps, we could get it to work using the default near far bounding values (0.2, 0.8).
@zhengyuf hi, yufeng, what about the expression? Do you use flame expression for NerFACE ?
I use FLAME expression + FLAME pose (except global head rotations), concatenated.
Thanks!
@zhengyuf hi, yufeng, can you introduce more details about the flame expression + flame pose, the final expression dimension is 50 + 12 or 50 + 9 (jaw, left and right eyes)? Do you use the original pose vector ? Have you converted it into other formats ?
@zhengyuf I am confused about this line in your function:
# and head transformation is:
head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :]
Why using index 1 which corresponds to the neck transformation ? I observed that the neck is not fixed in the tracked mesh using the deca.
Hi,
So the head pose is 15 dimensions, including global, neck, jaw, left and right eyes. I use 50+12 (neck, jaw, left and right eyes) as NerFace's expression parameters. Index 1 is for selecting the head transformation, which you get after applying neck transformation.
hi
I still haven't reproduced the same results following your code. I am sorry for asking more questions.
I'm not sure if index 1 (neck) is a transformation relative to the shoulder. If yes, why not using both global transformation (index 0) and neck transformation (index 1) as head transformation. Thus, is this shoulder fixed? I found one variable: GLOBAL_POSE in your optimize.py file, is it true or false in your setting?
No, I believe that the transformations are not relative, but absolute.
Hi,
It's indeed a bit complicated to transform to NerFace format.
First, we need to put the head transformation into the camera pose. Code for this (with some functions from
/code/flame/lbs.py
):def deca_pose_to_nerf_transform(flame, shape, exp, pose, world_mat): batch_size = shape.size(0) dtype = torch.float32 # reconstruct shape (needed for joint locations) v_template = flame.v_template.unsqueeze(0).expand(batch_size, -1, -1) betas = torch.cat([shape, exp], dim=1) v_shaped = v_template + blend_shapes(betas, flame.shapedirs) # get joints # NxJx3 array J = vertices2joints(flame.J_regressor, v_shaped) rot_mats = batch_rodrigues( pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3]) J_transformed, A = batch_rigid_transform(rot_mats, J, flame.parents, dtype=dtype) # and head transformation is: head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :] # IMavatar's scaling factor of 4 head_transform[:, :3, 3] *= 4 last_row = torch.Tensor([0, 0, 0, 1]).float().unsqueeze(0).unsqueeze(0).expand(batch_size, 1, 4).to(head_transform.device) w2c_nerface = torch.matmul(torch.cat([world_mat, last_row], dim=1), head_transform) # world_mat = w2c, but nerface expects c2w c2w_nerface = torch.inverse(w2c_nerface) return c2w_nerface
After this, we normalize the cameras as NerFace's authors suggested: gafniguy/4D-Facial-Avatars#3
After these steps, we could get it to work using the default near far bounding values (0.2, 0.8).
@zhengyuf Hi yufeng, what is the meaning of the c2w here? Does it describe the transformation of the FLAME mesh from the FLAME canonical space to the world space? If that's true, in the world space, is there exists a camera located at [0,0,0]? What's the convention of this camera (e.g. in opencv format, x to right, y to down, z to view direction)?
Hi, It's indeed a bit complicated to transform to NerFace format. First, we need to put the head transformation into the camera pose. Code for this (with some functions from
/code/flame/lbs.py
):def deca_pose_to_nerf_transform(flame, shape, exp, pose, world_mat): batch_size = shape.size(0) dtype = torch.float32 # reconstruct shape (needed for joint locations) v_template = flame.v_template.unsqueeze(0).expand(batch_size, -1, -1) betas = torch.cat([shape, exp], dim=1) v_shaped = v_template + blend_shapes(betas, flame.shapedirs) # get joints # NxJx3 array J = vertices2joints(flame.J_regressor, v_shaped) rot_mats = batch_rodrigues( pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3]) J_transformed, A = batch_rigid_transform(rot_mats, J, flame.parents, dtype=dtype) # and head transformation is: head_transform = A.view(batch_size, 5, 4, 4)[:, 1, :, :] # IMavatar's scaling factor of 4 head_transform[:, :3, 3] *= 4 last_row = torch.Tensor([0, 0, 0, 1]).float().unsqueeze(0).unsqueeze(0).expand(batch_size, 1, 4).to(head_transform.device) w2c_nerface = torch.matmul(torch.cat([world_mat, last_row], dim=1), head_transform) # world_mat = w2c, but nerface expects c2w c2w_nerface = torch.inverse(w2c_nerface) return c2w_nerface
After this, we normalize the cameras as NerFace's authors suggested: gafniguy/4D-Facial-Avatars#3 After these steps, we could get it to work using the default near far bounding values (0.2, 0.8).
@zhengyuf Hi yufeng, what is the meaning of the c2w here? Does it describe the transformation of the FLAME mesh from the FLAME canonical space to the world space? If that's true, in the world space, is there exists a camera located at [0,0,0]? What's the convention of this camera (e.g. in opencv format, x to right, y to down, z to view direction)?
@07hyx06 hi, I wanna ask have you already had answers about your questions? I am confused too.
Hey guys,
With w2c_nerface
, we get the world-to-camera transformation for NerFace. Here we've put the head rotation into the camera transformation, i.e., FLAME head rotation is 0 now, which is what NerFace expects.
Since Nerface needs camera-to-world matrix, we also inverse w2c_nerface
to get c2w_nerface
.
Yufeng
Hi Yufeng, Thank you for your apply! I am trying to running NerFACE with IMAvatar's data, now i am stuck at the transformation of the matrix.
I ran IMAvatar's preprocessing scripts on the NerFACE's data person1's video, so I got the DECA matrixs of person1.
Like you said, I frist use functuion deca_pose_to_nerf_transform
to get the c2w_nerface
matrix. Than, I normalize the cameras.
For example , the DECA matrix of the 1.png of train set is:
"world_mat":
[1.0, 0.0, 0.0, -0.03322279825806618],
[0.0, 1.0, 0.0, -0.029262954369187355],
[0.0, 0.0, 1.0, -3.589096784591675].
After functuion deca_pose_to_nerf_transform
I got:
[0.996785044670105, -0.016208326444029808, -0.07846513390541077, -0.3470061123371124],
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.03734137490391731],
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 3.5924389362335205],
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-09, 1.0].
And after normalization I got:
[0.996785044670105, -0.016208326444029808,-0.07846513390541077, -0.054800982790935565],
[0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.005897141205142365],
[0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 0.5673363589940569],
[6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-9, 1 ].
It is totally different from the matrix provided by NerFACE person1 dataset:
[0.975096, -0.0213444, -0.220754,-0.12505983864469886],
[0.0591482, 0.984335, 0.166091, 0.09199261821349274],
[ 0.213751, -0.175011, 0.961083, 0.5170857417970655],
[-0.0, 0.0, 0.0, 1.0].
Could you please help me figure this out? Thank you for your time and kindness!
JInChuan
Hi Yufeng, Thank you for your apply! I am trying to running NerFACE with IMAvatar's data, now i am stuck at the transformation of the matrix.
I ran IMAvatar's preprocessing scripts on the NerFACE's data person1's video, so I got the DECA matrixs of person1. Like you said, I frist use functuion
deca_pose_to_nerf_transform
to get thec2w_nerface
matrix. Than, I normalize the cameras. For example , the DECA matrix of the 1.png of train set is:"world_mat": [1.0, 0.0, 0.0, -0.03322279825806618], [0.0, 1.0, 0.0, -0.029262954369187355], [0.0, 0.0, 1.0, -3.589096784591675].
After functuion
deca_pose_to_nerf_transform
I got:[0.996785044670105, -0.016208326444029808, -0.07846513390541077, -0.3470061123371124], [0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.03734137490391731], [0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 3.5924389362335205], [6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-09, 1.0].
And after normalization I got:
[0.996785044670105, -0.016208326444029808,-0.07846513390541077, -0.054800982790935565], [0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.005897141205142365], [0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 0.5673363589940569], [6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-9, 1 ].
It is totally different from the matrix provided by NerFACE person1 dataset:
[0.975096, -0.0213444, -0.220754,-0.12505983864469886], [0.0591482, 0.984335, 0.166091, 0.09199261821349274], [ 0.213751, -0.175011, 0.961083, 0.5170857417970655], [-0.0, 0.0, 0.0, 1.0].
Could you please help me figure this out? Thank you for your time and kindness!
JInChuan
Hi Yufeng, Thank you for your apply! I am trying to running NerFACE with IMAvatar's data, now i am stuck at the transformation of the matrix.
I ran IMAvatar's preprocessing scripts on the NerFACE's data person1's video, so I got the DECA matrixs of person1. Like you said, I frist use functuion
deca_pose_to_nerf_transform
to get thec2w_nerface
matrix. Than, I normalize the cameras. For example , the DECA matrix of the 1.png of train set is:"world_mat": [1.0, 0.0, 0.0, -0.03322279825806618], [0.0, 1.0, 0.0, -0.029262954369187355], [0.0, 0.0, 1.0, -3.589096784591675].
After functuion
deca_pose_to_nerf_transform
I got:[0.996785044670105, -0.016208326444029808, -0.07846513390541077, -0.3470061123371124], [0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.03734137490391731], [0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 3.5924389362335205], [6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-09, 1.0].
And after normalization I got:
[0.996785044670105, -0.016208326444029808,-0.07846513390541077, -0.054800982790935565], [0.016453539952635765, 0.9998615384101868, 0.002479573478922248, 0.005897141205142365], [0.07841409742832184, -0.003762617474421859, 0.9969136714935303, 0.5673363589940569], [6.053171297537574e-10, -1.3091271122700476e-11, -6.334325064472068e-9, 1 ].
It is totally different from the matrix provided by NerFACE person1 dataset:
[0.975096, -0.0213444, -0.220754,-0.12505983864469886], [0.0591482, 0.984335, 0.166091, 0.09199261821349274], [ 0.213751, -0.175011, 0.961083, 0.5170857417970655], [-0.0, 0.0, 0.0, 1.0].
Could you please help me figure this out? Thank you for your time and kindness!
JInChuan
After passing the Nerface data through the IMavatar preprocessing pipeline, and then run it on the PointAvatar pipeline, you have encountered this problem at this time" /home/pytorch3d/pytorch3d/transforms/transform3d.py:800: UserWarning: R is not a valid rotation matrix warnings. warn(msg)"?
In the normalization step, we used
all_rigids[:,:,0] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
all_rigids[:,:,1] *= -1 # instead of all_rigids[:,:,2] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
and if it helps at all, here is what we get when we convert IMavatar format to NerFace format for yufeng.zip: https://drive.google.com/file/d/1qLOJpSzuEvYImZIT4O1cBUNkz-ziKi2E/view?usp=sharing
Hello, Yufeng Sorry to bother you, I tested above transforms_train.json of NerFace format you provided by using render_debug_camera_matrix() in NeRFACE, but the scale of overlay seems to be much smaller than normal condition, even if I enlarge the scale to 0.5, the results still do not match very well, did I miss something important? Thanks a lot!
Hi,
Could you train to train with this configuration? NerFace do some scale adjustment and I think it's normal that the scale doesn't match. The method also doesn't require alignment since no supervision from 3DMM is needed.
Hi, YuFeng Thanks for your great open-source work! I use your above data configuration to train the NeRFace, but it seems failed, I can't get the normal results. Is there any other special adjustment have you made to the NeRFace code? Thanks a lot!
@zhengyuf Hello! I 'm replicating the PointAvatar code, following the steps outlined in README.md. I wanted to ask why I'm encountering this error:
Traceback (most recent call last): File "/home/xietong/PointAvatar/code/scripts/exp_runner.py", line 35, in <module> runner.run() File "/home/xietong/PointAvatar/code/../code/scripts/train.py", line 325, in run model_outputs = self.model(model_input) File "/home/xietong/miniconda3/envs/point-avatar/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/xietong/PointAvatar/code/../code/model/point_avatar_model.py", line 182, in forward images = self._render(transformed_point_cloud, cameras) File "/home/xietong/PointAvatar/code/../code/model/point_avatar_model.py", line 102, in _render images, weights = self.compositor( ValueError: too many values to unpack (expected 2)
The self.compositor is the custom AlphaCompositor function in your pytorch3d library:
`class AlphaCompositor(nn.Module):
def __init__(
self, background_color: Optional[Union[Tuple, List, torch.Tensor]] = None
) -> None:
super().__init__()
self.background_color = background_color
def forward(self, fragments, alphas, ptclds, **kwargs) -> torch.Tensor:
background_color = kwargs.get("background_color", self.background_color)
images = alpha_composite(fragments, alphas, ptclds)
# images are of shape (N, C, H, W)
# check for background color & feature size C (C=4 indicates rgba)
if background_color is not None:
return _add_background_color_to_images(fragments, images, background_color)
return images`
And I tried to print the compositor, and it is:
` tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
...,`
Try checkout to the point-avatar branch in the customized pytorch3d repo, and re-install. Also, for new questions, please start a new issue.
Ok, thank you very much! Have a nice day!
Hello, PointAvatar is really a nice job! Currently, I am following your wonderful job but i am confused about how to use your Dataset on NeRF-based Method (e.g. NerFACE)? Especially, how to set the camera parameters, near and far? Right now I set these parameters by myself according to your provided xml and always get some strange artifacts. Could you please release a detailed description about these? Looking forward to your reply! Thank you very much!