Camera convention for D-NeRF

Hello,

Thank you for the very cool work! I have a question about the camera convention that you use to load the D-NeRF dataset. Here it's converting Blender to COLMAP.

matrix = np.linalg.inv(np.array(frame["transform_matrix"])) R = -np.transpose(matrix[:3,:3]) R[:,0] = -R[:,0] T = -matrix[:3, 3]

I compared it with the one from 3dgs: c2w = np.array(frame["transform_matrix"]) c2w[:3, 1:3] *= -1 w2c = np.linalg.inv(c2w) R = np.transpose(w2c[:3,:3]) T = w2c[:3, 3]

I noticed that both formulations actually give the same rotation matrix R, but the translation vector T has the first element flipped. I was wondering if you could provide some explanations as to why that is?

hustvl / 4DGaussians

Camera convention for D-NeRF #191