gulvarol / surreal

Learning from Synthetic Humans, CVPR 2017
http://www.di.ens.fr/willow/research/surreal
Other
587 stars 107 forks source link

Strange values after converting 3D joint positions from world to camera coordinates #29

Open GianMassimiani opened 5 years ago

GianMassimiani commented 5 years ago

Hi, thanks for the great dataset! I used some of the code you provided (e.g. for retrieving the camera extrinsic matrix) to convert 3D joint positions from world to camera coordinates. However, the values of joint positions in camera coordinates seem a bit strange to me. Here is the code:

def get_extrinsic_matrix(T):
    # Return the extrinsic camera matrix for SURREAL images
    # Script based on:
    # https://blender.stackexchange.com/questions/38009/3x4-camera-matrix-from-blender-camera
    # Take the first 3 columns of the matrix_world in Blender and transpose.
    # This is hard-coded since all images in SURREAL use the same.
    R_world2bcam = np.array([[0, 0, 1], [0, -1, 0], [-1, 0, 0]]).transpose()
    # *cam_ob.matrix_world = Matrix(((0., 0., 1, params['camera_distance']),
    #                               (0., -1, 0., -1.0),
    #                               (-1., 0., 0., 0.),
    #                               (0.0, 0.0, 0.0, 1.0)))

    # Convert camera location to translation vector used 
    # in coordinate changes
    T_world2bcam = -1 * np.dot(R_world2bcam, T)

    # Following is needed to convert Blender camera to 
    # computer vision camera
    R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]])

    # Build the coordinate transform matrix from world to 
    # computer vision camera
    R_world2cv = np.dot(R_bcam2cv, R_world2bcam)
    T_world2cv = np.dot(R_bcam2cv, T_world2bcam)

    # Put into 3x4 matrix
    RT = np.concatenate([R_world2cv, T_world2cv], axis=1)
    return RT, R_world2cv, T_world2cv

def world_to_camera(RT,p_w):
        # Get a set of n points in world coordinates (p_w) and
    # convert them to camera coordinates (p_c)
        # Args:
        # p_w (numpy array): points in world coordinates, shape (n, 3, 1)
        # RT  (numpy array) : 4x4 camera extrinsic matrix, shape (4,4)
    n_points = p_w.shape[0]
    ones = np.ones([n_points, 1, 1])
    p_w = np.concatenate((p_w, ones), axis=1)

    p_c = np.dot(RT, p_w[0])
    for p in p_w[1:]:
        p_c = np.concatenate((p_c, np.dot(RT, p)))

    return p_c.reshape(n_points,3,1)

# Read annotation file
mat = scipy.io.loadmat("./01_06_c0001_info.mat")

# Get camera position in world coordinates
camera_pos = mat['camLoc']

# Get camera extrinsic matrix
extrinsic, _, _ = get_extrinsic_matrix(camera_pos)

# Frame number
frame_id = 10

# Get joints positions in world coordinates
joints3d = mat['joints3D'][:, :, frame_id].T # shape (24, 3)
joints3d = joints3d.reshape([
                    joints3d.shape[0],
                    joints3d.shape[1],
                    1]
            ) # shape (24, 3, 1)

# Convert 3D joint positions from world to camera coords
joints3d_cam = world_to_camera(extrinsic, joints3d)

# Reshape to restore previous shape, and permute
joints3d_cam = joints3d_cam.reshape(joints3d_cam.shape[0], 
                        joints3d_cam.shape[1]) # shape (24, 3)
joints3d_cam = np.moveaxis(joints3d_cam, 0, 1) # shape (3, 24)

# Swap Y and Z axes
joints3d_cam[[0,1,2]] = joints3d_cam[[0,2,1]]

When plotting the joints in 3D I get strange depth values (see Y axis in the figure below). For example, in the image below the subject appears very close to the camera, however it's position on the Y axis (computed with the above code) is about 6 meters, which seems quite unrealistic to me: Screenshot from 2019-04-08 11-12-46

Do you have any idea why this is happening? Thanks

GianMassimiani commented 5 years ago

@gulvarol Hi Gül, I am still confused about the values of 3D joints data: 1) There seems to be a left/right inconsistency, e.g. the joint with index 4, which is supposed to be the left knee, appears to be the right knee instead. Therefore, when plotting 3D joints a left/right swap is needed.

2) In this answer you say that the reference point for 3D joint data is the camera location. This means that there is no need to multiply joint positions by the camera extrinsic matrix (as I did in the code above). However, I tried to plot the 3D joint data as stored in the _info.mat files, and they don’t seem to be expressed in camera coordinates. See these sample images: test1 test2 test3 test4

anas-zafar commented 2 years ago

@GianMassimiani were you able to solve the problem?