zhuhao-nju / facescape

FaceScape (PAMI2023 & CVPR2020)
833 stars 94 forks source link

Calculate the 6 dof head pose #80

Closed Bornblack closed 1 year ago

Bornblack commented 2 years ago

@zhuhao-nju Hi, Refer #39 #41, I align the raw mesh (.ply file) to the TU-model to make the raw mesh full-size. I try to calculate the 6 dof head pose in the physical space. Is it possible? The rendering result of the following code is correct, but the 6 dof pose seems unreasonable.

import sys, cv2, json
import numpy as np
import src.renderer as renderer
import src.utility as util
import trimesh
import math
from scipy.spatial.transform import Rotation as R

# test_num is the camera index
def projection_test(test_num, scale=1.0):

    # read params
    with open("../samples/sample_mview_data/4_anger/params.json", 'r') as f:
        params = json.load(f)

    # extract KRt dist
    cam_k = np.array(params['%d_K' % test_num])
    cam_Rt = np.array(params['%d_Rt' % test_num])
    cam_dist = np.array(params['%d_distortion' % test_num], dtype = np.float)
    h_src = params['%d_height' % test_num]
    w_src = params['%d_width' % test_num]

    # scale h and w
    h, w = int(h_src * scale), int(w_src * scale)
    cam_k[:2,:] = cam_k[:2,:] * scale

    # read image
    src_img = cv2.imread("../samples/sample_mview_data/4_anger/%d.jpg" % test_num)
    src_img = cv2.resize(src_img, (w, h))

    # undistort image
    undist_img = cv2.undistort(src_img, cam_k, cam_dist)

    # read and render mesh
    mesh_dirname = "../samples/sample_mview_data/4_anger.ply"

    # read model params
    id_idx = 212
    exp_idx = 4

    with open("./predef/Rt_scale_dict.json", 'r') as f:
        Rt_scale_dict = json.load(f)
        model_scale = Rt_scale_dict['%d'%id_idx]['%d'%exp_idx][0]
        model_Rt = np.array(Rt_scale_dict['%d'%id_idx]['%d'%exp_idx][1])

    # alignment the raw scan to TU model
    mesh = trimesh.load(mesh_dirname, process=False)
    mesh.vertices *= model_scale
    mesh.vertices = np.tensordot(model_Rt[:3,:3], mesh.vertices.T, 1).T + model_Rt[:3, 3]

    # calculate the transformation from camera coordinate to world coordinate
    cam2world = np.eye(4)
    cam2world[:3, :3] = cam_Rt[:3, :3].T
    cam2world[:3, 3] = -cam_Rt[:3, :3].T.dot(cam_Rt[:, 3])

   # calculate the transformation from world coordinate to the full-size model coordinate
    scale_matrix = np.eye(4)
    scale_matrix[0][0] = scale_matrix[0][0] * model_scale
    scale_matrix[1][1] = scale_matrix[1][1] * model_scale
    scale_matrix[2][2] = scale_matrix[2][2] * model_scale 

    world2model = np.eye(4)
    world2model = scale_matrix.dot(world2model)
    world2model = np.r_[model_Rt,np.array([[0,0,0,1]])].dot(world2model)

    # calculate the transformation from camera coordinate to model coordinate
    cam2model = world2model.dot(cam2world)

    # the original camera coordinate is uncertain, scale operation is needed to make it physical size
    model2cam = scale_matrix.dot(np.linalg.inv(cam2model))
    cam2model = np.linalg.inv(model2cam)

    r = R.from_matrix(model2cam[:3,:3])
    print(r.as_euler('xyz', degrees=True))
    print(model2cam[:3,3])

    r = R.from_matrix(cam2model[:3,:3])
    print(r.as_euler('xyz', degrees=True))
    print(cam2model[:3,3])

    _, rend_img = renderer.render_cvcam(mesh, cam_k, model2cam[:3,:], rend_size=(h, w))

    # project and show
    mix_img = cv2.addWeighted(rend_img, 0.5, undist_img, 0.5, 0)
    concat_img = np.concatenate((undist_img, mix_img, rend_img), axis = 1)

    return concat_img

util.show_img_arr(projection_test(49, 0.05), bgr_mode = True)

After the transformation, the unit of the camera coordinate system supposes to be mm. Then the position of the head is 1.5 m away in the case. That is unreasonable. Can you help?

Bornblack commented 2 years ago

@zhuhao-nju Hi, Is the transformation right? Recently I have tried some optimization-based approaches to reconstruct the 3d face from RGB images with the real camera intrinsics. As real camera intrinsics are used, the reconstruted faces are supposed to be in camera space with actual face size. Then I try to calcualte the groundtruth 6 dof head pose for the facescape images, so that I can evaluate the reconstruction result directly on facescape data without any registration process. Can you help me out?

zhuhao-nju commented 1 year ago

Hi, sorry for the late reply.

I believe the scale in issue #39 should solve the problem, which will transform the model from an unscaled SFM coordinate to a canonical coordinate with a uniform scale, and the scale after the transformation is close to the real scale with mm unit. We have validated the intrinsic, extrinsic parameters, and Rt_scale_dict.json file several times and should confirm that they are correct.

In the following problem, the 'real camera intrinsics' are not in actual scale, and you should still compensate for scale as mentioned in #39 and Rt_scale_dict.json file.