Rendering forward driving scenes

kwea123 / nerf_pl

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

https://www.youtube.com/playlist?list=PLDV2CyUo4q-K02pNEyDr7DYpTQuka3mbV

MIT License

2.71k stars 475 forks source link

Rendering forward driving scenes #106

Open zgojcic opened 2 years ago

zgojcic commented 2 years ago

Hi Kwea123,

thanks for your great re-implementation of Nerf. In several issues, I have seen that you have successfully used Nerf to fit the forward driving scene (e.g. from KITTI) while using the NDC representation. Could you maybe provide a bit more information on this? Specifically,

1) Did you recenter the poses based on the first camera or on the "average camera" as it is the default for the forward-facing scenes? Did you use any scale factor (sc, bd_factor)? 2) Did you have to adapt the sampling of t due to the large depth range in such scenes? How did you define the near and far bound? 3) Did the rerendered depth made sense geometrically? Did you try to convert them to point clouds?

I would be happy to discuss this also offline if you are maybe interested.

Best Zan

mertkiray commented 2 years ago

Hello @zgojcic

Did you manage to find answer to these questions? Also how did you integrate the poses from Kitti to NeRF?

kwea123 commented 2 years ago

Hi, to be specific, I successfully trained on static forward (no or little turning) driving scenes. I trained on some internal data and not KITTI, that's the reason I didn't (and couldn't) show any results here, but I am submitting a paper to ICRA, if it gets admitted, I will share with you here.

The tricks I can share with you currently (like I have posted on other threads too)

Yes, I centered the pose like for llff (exactly the same code), but I also moved the z-origin to the z of the first camera so that all scene content lies behind this frustum.
I use NDC, so no need to specify the bounds (0-1 actually).
Yes, the depth is crisp (I use multi-view in my setup, but also tried monocular, and the depth is good too). I added another loss to explicitly encourage the sky to be at infinitely far thanks to some proxy semantic segmentation, so the total depth image looks impeccable.

I would also like to know your setup when training on KITTI, and what problems you get? IMO training like for llff shouldn't pose too much problem, am I wrong?

mertkiray commented 2 years ago

Hello @kwea123, I am using KITTI-360 dataset with this training images(https://drive.google.com/drive/folders/1CE3giBoWGSqkbub74gox0XuP08fHfV-n?usp=sharing).

There are intrinsics and poses already in the dataset which I am trying to integrate into NeRF.

I followed https://github.com/Fyusion/LLFF#using-your-own-poses-without-running-colmap to generate poses_bounds.npy for loading the data.

I am right now trying to figure out the data loading code. In llff.py you changed from "down right back" to "right up back".

My dataset has the following coordinates:

So in the end, I should follow this transformation and change it to my needs accordingly which is:

poses = np.concatenate([poses[..., 0:1], -poses[..., 1:2], -poses[..., 2:3], poses[...,3:4]], -1)

is my reasoning true?

This is the poses and pointcloud I get from the KITTI360 dataset.

and this is from colmap gui

Also, will the NDC work in turning(like right turn or left turn) scenes?

Thank you so much.

DRosemei commented 2 years ago

Hi, to be specific, I successfully trained on static forward (no or little turning) driving scenes. I trained on some internal data and not KITTI, that's the reason I didn't (and couldn't) show any results here, but I am submitting a paper to ICRA, if it gets admitted, I will share with you here.

The tricks I can share with you currently (like I have posted on other threads too)

Yes, I centered the pose like for llff (exactly the same code), but I also moved the z-origin to the z of the first camera so that all scene content lies behind this frustum.

I use NDC, so no need to specify the bounds (0-1 actually).

Yes, the depth is crisp (I use multi-view in my setup, but also tried monocular, and the depth is good too). I added another loss to explicitly encourage the sky to be at infinitely far thanks to some proxy semantic segmentation, so the total depth image looks impeccable.

I would also like to know your setup when training on KITTI, and what problems you get? IMO training like for llff shouldn't pose too much problem, am I wrong?

@kwea123 Thank for you great work! I want to know how to "encourage the sky to be at infinitely far". I convert kitti ground truth poses into ndc coordinates, so how could I supervise infinite depth. Maybe 1e10, 1 or others for infinite depth.

kwea123 commented 2 years ago

Infinite is 1 in ndc, so something like (1-depth)**2 works.

DRosemei commented 2 years ago

@kwea123 I have another question about using ground truth poses in datasets like kitti. Now I am using poses generated by colmap and get "poses_bounds.npy" by LLFF. While not all scene poses could be ectracted by colmap, so I list some steps. Assume we get c2w poses like colmap.(By the way, are colmap poses correspond to first camera pose? In other words, c2w poses are actually camera poses referenced to first camera)

Then we do like this poses = np.concatenate([poses, np.tile(hwf[..., np.newaxis], [1,1,poses.shape[-1]])], 1)
and this poses = np.concatenate([poses[:, 1:2, :], poses[:, 0:1, :], -poses[:, 2:3, :], poses[:, 3:4, :], poses[:, 4:5, :]], 1)
While gt poses do not have bounds close_depth, inf_depth = np.percentile(zs, .1), np.percentile(zs, 99.9). So how do we set bounds properly.
Finally, we could parse "poses_bounds.npy" for later works Are these steps right? Thanks in advance!