question about data.utils.process_camera_inv

I'm confused when I read this function. Do the operations like trans[2] += -10, c *= 0.27 c[1] += 0.015 c[2] += 0.161, K[0,0] = 2985.29/700 * focal / 1050 K[1,1] = 2985.29/700 * focal / 1050 and pose[:3, 3] = pose[:3, 3]/4.0 * 2.7 have any special meaning?

def process_camera_inv(translation, Rs, focals): #crop_params):

    c_list = []

    N = len(translation)
    # for trans, R, crop_param in zip(translation,Rs, crop_params):
    for idx, (trans, R, focal) in enumerate(zip(translation, Rs, focals)):

        idx_prev = max(idx - 1, 0)
        idx_last = min(idx + 2, N - 1)

        trans = np.mean(translation[idx_prev: idx_last], axis = 0)
        R = np.mean(Rs[idx_prev: idx_last], axis = 0)

        # why
        trans[2] += -10
        c = -np.dot(R, trans)

        # # no why
        # c = trans

        pose = np.eye(4)
        pose[:3, :3] = R

        # why
        c *= 0.27
        c[1] += 0.015
        c[2] += 0.161
        # c[2] += 0.050  # 0.160

        pose[0, 3] = c[0]
        pose[1, 3] = c[1]
        pose[2, 3] = c[2]

        # focal = 2985.29
        w = 1024#224
        h = 1024#224

        K =np.eye(3)
        K[0][0] = focal
        K[1][1] = focal
        K[0][2] = w/2.0
        K[1][2] = h/2.0

        Rot = np.eye(3)
        Rot[0, 0] = 1
        Rot[1, 1] = -1
        Rot[2, 2] = -1        
        pose[:3, :3] = np.dot(pose[:3, :3], Rot)

        # fix intrinsics
        K[0,0] = 2985.29/700 * focal / 1050
        K[1,1] = 2985.29/700 * focal / 1050
        K[0,2] = 1/2
        K[1,2] = 1/2     
        assert K[0,1] == 0
        assert K[2,2] == 1
        assert K[1,0] == 0
        assert K[2,0] == 0
        assert K[2,1] == 0  

        # fix_pose_orig
        pose = np.array(pose).copy()

        # why
        pose[:3, 3] = pose[:3, 3]/4.0 * 2.7
        # # no why
        # t_1 = np.array([-1.3651,  4.5466,  6.2646])
        # s_1 = np.array([-2.3178, -2.3715, -1.9653]) + 1
        # t_2 = np.array([-2.0536,  6.4069,  4.2269])
        # pose[:3, 3] = (pose[:3, 3] + t_1) * s_1 + t_2

        c = np.concatenate([pose.reshape(-1), K.reshape(-1)])
        c_list.append(c.astype(np.float32))          

    return c_list

Great question! This was indeed a challenge during our research. Although EG3D had published their code, they did not provide an explanation for the manual adjustments they made to their camera poses in their code. This was particularly problematic since our face alignment process was completely different. For the talking face task, we cropped the videos based solely on the bounding box calculated in the first frame, meaning that the subsequent frames were not aligned. Given this approach, determining how to model the rotation and translation was complex, especially when the EG3D camera convention was potentially misleading. We had no option but to manually adjust these parameters in the context of a talking face setting.

theEricMa / OTAvatar

question about data.utils.process_camera_inv #22