Weird Camera System: mismatched pointcloud and poses

Hi, thanks a lot for releasing the code of the impressing work! I got relatively good results on the data provided by the authors. However, when it comes to my self-prepared data, it failed to show reasonable performance. The result images are significantly distorted and blurry. In the following image, the left two are GT, and the right two are results of NeurMips. Combined Stacks-1

After a period of struggling, I found that the coordinate system is not consistent between the NSVF camera poses and the input point cloud (provided by the authors). For the camera convention of NSVF, the camera pose matrix corresponds to (right, down, front). However, the pointcloud will be upside down when projected onto images.

Screenshot from 2023-08-22 20-11-57

This means there should be a transformation between the NSVF poses and pointcloud. I found the y-axis and z-axis should be flipped:

pts[:, [1, 2]] *= -1

Screenshot from 2023-08-22 20-13-38

Actually, I didn't find this explicit transformation in the codebase. My solution is simply adding this axis convertion to my pointcloud in the dataloader. Then it works!

Combined Stacks-2

This phenomenon can be checked by the following code:

import numpy as np
import os
from matplotlib import pyplot as plt
import imageio

def convert_pose_nsvf_to_pytorch3d(transform_matrix):
    # Transform from camera2world to world2camera
    w2c = np.linalg.inv(transform_matrix).astype(np.float32)
    # KiloNeRF processing
    w2c[:3, 1:3] = -w2c[:3, 1:3]
    # Ours (neurmips) convention
    R = w2c[:3,:3]
    T = w2c[:3,3]
    R[:2,:] = -R[:2,:]
    R = np.transpose(R)
    T[:2] = -T[:2]
    return R, T

def convert_pose_pytorch3d_to_nsvf(r, t):
    T = t.copy()
    R = r.copy()

    T[:2] = -T[:2]
    R = np.transpose(R)
    R[:2,:] = -R[:2,:]
    transform_matrix = np.eye(4)
    transform_matrix[:3,3] = T
    transform_matrix[:3,:3] = R
    transform_matrix[:3, 1:3] = -transform_matrix[:3, 1:3]
    transform_matrix = np.linalg.inv(transform_matrix).astype(np.float32)
    return transform_matrix

def get_camera_intrinsic(path):
    with open(path, 'r') as file:
        lines = file.readlines()
        lines = lines[3].split(' ')
        W, H = int(lines[2]), int(lines[3])
        fx, fy = float(lines[4]), float(lines[5])
        px, py = float(lines[6]), float(lines[7])
    return [W, H, fx, fy, px, py]

root_path = "NeurMips/TanksAndTemple/TanksAndTemple/Barn/train"

intrinsic = get_camera_intrinsic(os.path.join(root_path, 'cameras.txt'))
print(intrinsic)

intrinsic_mat = np.array([
    [intrinsic[2], 0, intrinsic[4]],
    [0, intrinsic[3], intrinsic[5]],
    [0,0,1],
])
print(intrinsic_mat)

W = int(intrinsic[0])
H = int(intrinsic[1])

img_paths = sorted(glob.glob(os.path.join(root_path, "images", "*.png")))
print(len(img_paths))
imgs = np.zeros([len(img_paths), H, W, 3]).astype(np.uint8)
for i in range(len(img_paths)):
    imgs[i, :, :, :] = imageio.v2.imread(img_paths[i])

R = np.load(os.path.join(root_path, 'R.npy'))
T = np.load(os.path.join(root_path, 'T.npy'))

poses_orig = np.zeros([R.shape[0], 4, 4])
for i in range(R.shape[0]):
    poses_orig[i, :, :] = convert_pose_pytorch3d_to_nsvf(R[i, :, :], T[i, :])
    poses_orig[i, :, :] = np.linalg.inv(poses_orig[i, :, :])

print(poses_orig.shape)

idx = 100

pts = np.load(os.path.join(root_path, "points3D.npy"))

pts = pts[::50, :]
print(pts.shape)

# --- Adding the transformation to align the pointcloud to NSVF poses --- #
pts[:, [1, 2]] *= -1
# ----------------------------------------------------------------------- #

pts_cam = np.concatenate([pts, np.ones([pts.shape[0], 1])], axis=-1) @ poses_orig[idx, :, :].T
print(pts_cam.shape)

pts_scr = (pts_cam[:, :3] / pts_cam[:, [2]]) @ intrinsic_mat.T
print(pts_scr.shape)

valid_mask = (pts_scr[:, 0] >= 0) & (pts_scr[:, 0] < W) & (pts_scr[:, 1] >= 0) & (pts_scr[:, 1] < H)
pts_2d = pts_scr[valid_mask, :2]
print(pts_2d.shape)

plt.figure()
plt.subplot(121)
plt.imshow(imgs[idx, :, :, :])
plt.subplot(122)
plt.imshow(imgs[idx, :, :, :])
plt.scatter(pts_2d[:, 0], pts_2d[:, 1], s=0.01)
plt.show()

I hope this issue will help others run the algorithm easier on their own dataset.

zhihao-lin / neurmips

Weird Camera System: mismatched pointcloud and poses #8