zhihao-lin / neurmips

Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"
MIT License
113 stars 11 forks source link

Weird Camera System: mismatched pointcloud and poses #8

Closed OceanYing closed 1 year ago

OceanYing commented 1 year ago

Hi, thanks a lot for releasing the code of the impressing work! I got relatively good results on the data provided by the authors. However, when it comes to my self-prepared data, it failed to show reasonable performance. The result images are significantly distorted and blurry. In the following image, the left two are GT, and the right two are results of NeurMips. Combined Stacks-1

After a period of struggling, I found that the coordinate system is not consistent between the NSVF camera poses and the input point cloud (provided by the authors). For the camera convention of NSVF, the camera pose matrix corresponds to (right, down, front). However, the pointcloud will be upside down when projected onto images.

Screenshot from 2023-08-22 20-11-57

This means there should be a transformation between the NSVF poses and pointcloud. I found the y-axis and z-axis should be flipped:

pts[:, [1, 2]] *= -1

Screenshot from 2023-08-22 20-13-38

Actually, I didn't find this explicit transformation in the codebase. My solution is simply adding this axis convertion to my pointcloud in the dataloader. Then it works!

Combined Stacks-2

This phenomenon can be checked by the following code:

import numpy as np
import os
from matplotlib import pyplot as plt
import imageio

def convert_pose_nsvf_to_pytorch3d(transform_matrix):
    # Transform from camera2world to world2camera
    w2c = np.linalg.inv(transform_matrix).astype(np.float32)
    # KiloNeRF processing
    w2c[:3, 1:3] = -w2c[:3, 1:3]
    # Ours (neurmips) convention
    R = w2c[:3,:3]
    T = w2c[:3,3]
    R[:2,:] = -R[:2,:]
    R = np.transpose(R)
    T[:2] = -T[:2]
    return R, T

def convert_pose_pytorch3d_to_nsvf(r, t):
    T = t.copy()
    R = r.copy()

    T[:2] = -T[:2]
    R = np.transpose(R)
    R[:2,:] = -R[:2,:]
    transform_matrix = np.eye(4)
    transform_matrix[:3,3] = T
    transform_matrix[:3,:3] = R
    transform_matrix[:3, 1:3] = -transform_matrix[:3, 1:3]
    transform_matrix = np.linalg.inv(transform_matrix).astype(np.float32)
    return transform_matrix

def get_camera_intrinsic(path):
    with open(path, 'r') as file:
        lines = file.readlines()
        lines = lines[3].split(' ')
        W, H = int(lines[2]), int(lines[3])
        fx, fy = float(lines[4]), float(lines[5])
        px, py = float(lines[6]), float(lines[7])
    return [W, H, fx, fy, px, py]

root_path = "NeurMips/TanksAndTemple/TanksAndTemple/Barn/train"

intrinsic = get_camera_intrinsic(os.path.join(root_path, 'cameras.txt'))
print(intrinsic)

intrinsic_mat = np.array([
    [intrinsic[2], 0, intrinsic[4]],
    [0, intrinsic[3], intrinsic[5]],
    [0,0,1],
])
print(intrinsic_mat)

W = int(intrinsic[0])
H = int(intrinsic[1])

img_paths = sorted(glob.glob(os.path.join(root_path, "images", "*.png")))
print(len(img_paths))
imgs = np.zeros([len(img_paths), H, W, 3]).astype(np.uint8)
for i in range(len(img_paths)):
    imgs[i, :, :, :] = imageio.v2.imread(img_paths[i])

R = np.load(os.path.join(root_path, 'R.npy'))
T = np.load(os.path.join(root_path, 'T.npy'))

poses_orig = np.zeros([R.shape[0], 4, 4])
for i in range(R.shape[0]):
    poses_orig[i, :, :] = convert_pose_pytorch3d_to_nsvf(R[i, :, :], T[i, :])
    poses_orig[i, :, :] = np.linalg.inv(poses_orig[i, :, :])

print(poses_orig.shape)

idx = 100

pts = np.load(os.path.join(root_path, "points3D.npy"))

pts = pts[::50, :]
print(pts.shape)

# --- Adding the transformation to align the pointcloud to NSVF poses --- #
pts[:, [1, 2]] *= -1
# ----------------------------------------------------------------------- #

pts_cam = np.concatenate([pts, np.ones([pts.shape[0], 1])], axis=-1) @ poses_orig[idx, :, :].T
print(pts_cam.shape)

pts_scr = (pts_cam[:, :3] / pts_cam[:, [2]]) @ intrinsic_mat.T
print(pts_scr.shape)

valid_mask = (pts_scr[:, 0] >= 0) & (pts_scr[:, 0] < W) & (pts_scr[:, 1] >= 0) & (pts_scr[:, 1] < H)
pts_2d = pts_scr[valid_mask, :2]
print(pts_2d.shape)

plt.figure()
plt.subplot(121)
plt.imshow(imgs[idx, :, :, :])
plt.subplot(122)
plt.imshow(imgs[idx, :, :, :])
plt.scatter(pts_2d[:, 0], pts_2d[:, 1], s=0.01)
plt.show()

I hope this issue will help others run the algorithm easier on their own dataset.

zhihao-lin commented 1 year ago

Thanks for sharing this! This could definitely help others solve the camera convention issues. In NeurMips, we follow the Pytorch3D convention, and more details could be found here: https://github.com/zhihao-lin/neurmips/blob/main/doc/dataset.md So if your dataset provides camera poses in other convention (e.g. OpenCV, OpenGL), please make sure to transform the camera poses and point cloud to the correct and shared coordinate system.