Question about camera poses

Loydian commented 1 year ago

Thanks for your extraordinary work and the released code. But I found that, after you transform the camera poses from llff format to nerf format with the following code,

  poses = np.concatenate(
      [poses[:, 1:2, :], -poses[:, 0:1, :], poses[:, 2:, :]], 1
  )

but you inverse the y and z axis again by the parse_llff_pose function as follows. What is this for?

def parse_llff_pose(pose):
  """convert llff format pose to 4x4 matrix of intrinsics and extrinsics."""

  h, w, f = pose[:3, -1]
  c2w = pose[:3, :4]
  c2w_4x4 = np.eye(4)
  c2w_4x4[:3] = c2w
  c2w_4x4[:, 1:3] *= -1
  intrinsics = np.array(
      [[f, 0, w / 2.0, 0], [0, f, h / 2.0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
  )
  return intrinsics, c2w_4x4

This really confuses me.

Loydian commented 1 year ago

It seems that you transform poses into opencv format. What is this for?

zhengqili commented 1 year ago

This is some historical/compatibility reasons to keep code same as original NeRF and IBRNet data format and codebase. Basically, the script loads colmap format and convert and save to llff format. In the end, the dataloader will transform it into colmap again. So baiscally, opencv in and opencv out for training and inference

google / dynibar

Question about camera poses #23