Converting camera matrices from OpenCV / Pytorch3D

benjiebob commented 2 years ago

Hi there, excellent library. I've been banging my head against the wall for a few days now with this problem so thought it might be a good time to beg for help! :)

I have a camera_matrix and run OpenCV's cv2.solvePnP(points3d, points2d, camera_matrix, distCoeffs=None) to obtain extrinsics R, t. Using similar code to this, I can render my SMPL mesh vertices and it looks fine (see Fig A)

For debugging purposes, I've loaded these parameters into Pytorch3D, using their cameras_from_opencv_projection(R, t, camera_matrix, image_size), method and am able to correctly render the mesh (see Fig B)

Now is where the fun begins...

I've been trying for a few days to figure out how to render this mesh correctly using Pyrender. I've constructed a rendering function (similar to this one):

def render(image, vertices, faces, camera_pose, camera_matrix):
    material = pyrender.MetallicRoughnessMaterial(
        metallicFactor=0.2, alphaMode="OPAQUE", baseColorFactor=(0.8, 0.3, 0.3, 1.0)
    )

    mesh = trimesh.Trimesh(vertices, faces, process = False)

    mesh = pyrender.Mesh.from_trimesh(mesh, material=material)
    scene = pyrender.Scene(ambient_light=(0.5, 0.5, 0.5))

    scene.add(mesh, "mesh")

    fx, fy, cx, cy = camera_matrix[0,0], camera_matrix[1,1], camera_matrix[0,2], camera_matrix[1,2]

    camera = pyrender.IntrinsicsCamera(fx=fx, fy=fy, cx=cx, cy=cy)
    scene.add(camera, pose=camera_pose)

    # There is some lighting stuff here that I'll omit for space.

    color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA)
    color = color.astype(np.float32) / 255.0
    valid_mask = (rend_depth > 0)[:, :, None]
    output_img = color[:, :, :3] * valid_mask + (1 - valid_mask) * image

    return output_img

Based on this answer, I've tried:

Inverting rows 1 & 2 of R & t (see Fig C).
Next I wondered if the problem is due to post vs. pre multiplying, so I tried also transposing R (see Fig D).

As you can see, the Pyrender mesh still doesn't line up with the others.

From here, I've tried a whole bunch of things including fiddling with the principle point, starting with the PyTorch3D matrix and rotating 180 through X (as suggested here) but nothing has worked.

I'd be super grateful if someone could help me solve this. In case helpful, see the following descriptions for the camera setup in each of the libraries:

Thanks! Ben

AndersonDaniel commented 1 year ago

A bit late to the party, but what worked for me was inverting rows 1 & 2 then inverting the whole camera_pose (which indeed transposes R like you tried, but it also corrects t for the new R).

In short -

camera_pose[[1, 2]] *= -1
camera_pose = np.linalg.inv(camera_pose)

oneThousand1000 commented 1 year ago

Hi! I also tried a few days to get the correct pyrender camera pose, and I find this problem is maybe caused by the flipped y and z axis in pyrender, so you should flip the translation and rotation angles on the y and z axis, I posted an example here: https://github.com/mmatl/pyrender/issues/249

conallwang commented 1 year ago

A bit late to the party, but what worked for me was inverting rows 1 & 2 then inverting the whole camera_pose (which indeed transposes R like you tried, but it also correct t for the new R).

In short -
camera_pose[[1, 2]] *= -1
camera_pose = np.linalg.inv(camera_pose)

Thanks! This solution also works for me.

Chris10M commented 4 months ago

Hi,

The transformation works. Can I please know the intuition behind the inversion of the camera_pose?

camera_pose[[1, 2]] *= -1
camera_pose = np.linalg.inv(camera_pose)

AndersonDaniel commented 4 months ago

Hey @Chris10M , I found this by a bit of trial and error, but the intuition generally is that OP mentioned transposing R due to pre- vs post- multiplication of the transformation matrix, and I felt like t ought to be adjusted to the transposed rotation matrix.

hua-zi commented 4 months ago

pyrender Creating Cameras

My guess is this:

pyrender uses OpenGL camera coordinates to specify their poses: the camera z-axis points away from the scene, the x-axis points to the right in image space, and the y-axis points up in image space.
OpenCV's camera z-axis points toward the scene, the x-axis points to the right in image space, and the y-axis points down in image space.
The y-axis and z-axis of the pyrender camera and the OpenCV camera are in opposite directions, so we need to invert rows 1 & 2 of $RT_{opencv}$.
pyrender poses transform from camera to world coordinates while $RT{pyrender}$ transform from world to camera, so we need to inverse $RT{pyrender}$.
```
RT_pyrender = RT_opencv
RT_pyrender[[1, 2]] *= -1
camera_pose = np.linalg.inv(RT_pyrender)
```

mmatl / pyrender

Converting camera matrices from OpenCV / Pytorch3D #228