kwea123 / nerf_pl

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
https://www.youtube.com/playlist?list=PLDV2CyUo4q-K02pNEyDr7DYpTQuka3mbV
MIT License
2.71k stars 475 forks source link

Training NERF using real-captured data #16

Closed phongnhhn92 closed 4 years ago

phongnhhn92 commented 4 years ago

Hello, I have followed your example to train NERF on my own data. So I have seen you and other guys have some success with single object scene (silica model). How about the real scene (fern or orchids dataset)?

I have captured a video of my office link. However, I cant use colmap to estimate poses to train NERF model. Since you are more experienced than me on this project. Can u show me some suggestion ? It's interesting to see if this method works on real data like this.

This is the error from the colmap:

python imgs2poses.py ./cmvs/
Need to run COLMAP
Features extracted
Features matched
Sparse map created
Finished running COLMAP, see ./cmvs/colmap_output.txt for logs
Post-colmap
Cameras 5
Images # 6
Traceback (most recent call last):
  File "imgs2poses.py", line 18, in <module>
    gen_poses(args.scenedir, args.match_type)
  File "/home/phong/data/Work/Paper3/Code/LLFF/llff/poses/pose_utils.py", line 276, in gen_poses
    save_poses(basedir, poses, pts3d, perm)
  File "/home/phong/data/Work/Paper3/Code/LLFF/llff/poses/pose_utils.py", line 66, in save_poses
    cams[ind-1] = 1
IndexError: list assignment index out of range
kwea123 commented 4 years ago

Actually the first data I tried was real forward facing scene but due to coronavirus I can only think of my messy desktop to capture, so I didn't post it haha... pc It works quite well except some flickering frames, which my be due to bad lighting in the room.

Concerning your data, the photos look good, one concern is that maybe it covers too wide range that colmap cannot handle. I will take a look. If you are using local pc, you can also try to run colmap with the gui, not the imgs2poses.py to see what reconstruction looks like.

kwea123 commented 4 years ago

Strange, it works perfectly using colmap gui. Are you able to run colmap gui?

1

I can recover the poses correctly.

Edit: although it reconstructs.. the poses don't seem to be correct. So I recommend two attempts:

  1. Try to use "Dense reconstruction" to see if it produces better pose estimates
  2. Try to take photos with smaller lateral range (do not rotate camera too much)
phongnhhn92 commented 4 years ago

Hi, It is weird that the script imgs2poses.py can not estimate the poses but the colmap gui is able to do it. I have tested my images using colmap gui and the sparse reconstruction works. I am trying to run Dense reconstruction to see the differences.

Btw, how can u tell that the poses don't seem to be correct ? I have similar sparse reconstruction with you but I have no idea how to evaluate this. Can you clarify ?

Actually, I am curious on how does this NERF works on large-scale scene. For example, can we test this on large dataset such as ScanNet, Matterport or DTU dataset.

In fact, I have capture a new set of images with lateral movement (not so much camera rotation) and this is my result. As you can see, the printer looks good but the background is not that good. My intial thought is that this NERF model doesnt work that well with far objects (like hallways). I dont know if there are any quick parameter fixes that we can change to train the model.

gif_optim

phongnhhn92 commented 4 years ago

Another issue is that if I am using colmap gui then how can I compute this pose_bound,npy file. I guess this is an necessary file for both training and testing.

kwea123 commented 4 years ago

For example the 001 and 035 are almost rotated by 90 degrees, but the reconstruction looks like there's no rotation... maybe I'm wrong, it's just personal estimation.

Currently the constraint is the world space, you can only have two kinds:

  1. 360 inward facing, such that the world space is a cube.
  2. forward facing, such that the world space is a cuboid that fully lies behind a certain plane. You can see here or some other issues in the original repo for explanation.

So complex structures like matterport won't work, since they are more like 360 outward facing which doesn't satisfy the above constraint. At the limit DTU still works (I tried on 1 scene) since it satisfies constraint 2.

For your new data, I reckon that the result is reasonable. As I mentioned above, the world space must be fully behind a certain plane, so anything behind that plane won't be correct, which explains the result on the left and right part (it might be due to scarce data on the edge as well).

To make it work on 360 outward facing or even more complex scenes, although I think the concept still works, it'll be lots of work:

  1. An efficient way to encode the whole space. The reason why original NeRF has the above 2 constraints is that we need to confine the coordinates into [-1, 1]^3 so that we can encode them and train efficiently. Training on 360 outward facing might require something like a spherical coordinate system to encode.
  2. An efficient way to sample the points. Different from 360 inward facing and forward facing where we know the roi lies near us, in complex scenes we need to design another way to sample the training points in 3d. Otherwise the result won't be as good in my opinion.

These are just some thoughts. Anyway I think it's a total new research topic, so there's no easy way to do that.

Finally for poses_bounds.npy, you can use colmap gui to generate sparse reconstructions first then call imgs2poses.py with the same argument. It will skip colmap part and only extract the bounds.

phongnhhn92 commented 4 years ago

Thanks for your clarification ! It make sense now.

sixftninja commented 4 years ago

I trained a model with 32 inward-facing images.

Any advice on what might be going wrong? I used full resolution images for COLMAP. Added the --spheric argument for training. While training some of the epochs (1, 9, 11, 13, 18, 26) did not complete training. Also, I let the model train till epoch 30 but checkpoints were saved only for epoch 17, 20, 22, 23 and 25. I used epoch25 checkpoint to render novel poses using eval.py.

phongnhhn92 commented 4 years ago

I am not sure why the training fails exactly but I guess your training images have complicated background. This NERF model doesn't do well with cluttered background. Why dont you try to put the pan on a white table and put the camera closer to it ? I doubt it will work this time.

sixftninja commented 4 years ago

Yeah started training again with a white background. Let's see now..

kwea123 commented 4 years ago

@3ventHoriz0n What do you mean by "didn't complete training"? By default I only save the best 5 epochs, that's why you only have 5 ckpts at the end. Every epoch should finish normally. You can change the number here: https://github.com/kwea123/nerf_pl/blob/f02913b8cec85ee1e65813064224270dfa9d60e1/train.py#L160

sixftninja commented 4 years ago

Oh missed that part.

Well, usually when an epoch finishes, the progress bar is replaced by progress for next epoch. at any given time I only see one progress bar on the screen. But for the epochs I mentioned, I saw the progress bar stuck midway and for the next epoch a new progress bar would load. So I had 7 progress bars on the screen. 6 of them stuck midway and the last one for current epoch. Don't know what that means though.

kwea123 commented 4 years ago

sometimes if you accidentally perturb the terminal (like accidentally pressed a key), it interrupts the progress bar, so a new progress bar appears and the old one will be left on the terminal and looks like it was stuck. It's just visual bug, doesn't affect training.

sixftninja commented 4 years ago

Ok I seriously can't figure out what I'm doing wrong.

Original

Generated

The model doesn't seem to be learning anything at all.

kwea123 commented 4 years ago

Can you share the sparse folder generated by colmap? And the poses_bounds.npy file. Also the training log files.

sixftninja commented 4 years ago

sparse logs poses_bounds

kwea123 commented 4 years ago

This is what I see from your training log, the center image is the prediction, I didn't see anything wrong. Also the poses seem correct. How did you generate that noisy image? Screenshot from 2020-06-11 22-57-13

sixftninja commented 4 years ago

I ran eval.py using checkpoint epoch=28 and dataset_name llff.

can you also tell me how you are using tensor board to visualize predictions?

kwea123 commented 4 years ago

Maybe you forgot to add --spheric_poses in evaluation? Is it indeed not mentioned in readme, I will add that.

sixftninja commented 4 years ago

yes I did forget to add that. I'll try again.

kwea123 commented 4 years ago

can you share the checkpoint?

sixftninja commented 4 years ago

epoch=28

kwea123 commented 4 years ago

There might be need to modify these two lines in order to get good visual result: https://github.com/kwea123/nerf_pl/blob/d41ae302dd3d186f2f12fb411d8874a1d004e00d/datasets/llff.py#L130-L131 it controls where the virtual camera is placed. This part is actually hard-coded currently, I'm still finding a way to let it adapt to various scenes. For your data I find

        trans_t = lambda t : np.array([
            [1,0,0,0],
            [0,1,0,-0.6*t],
            [0,0,1,0.7*t],
            [0,0,0,1],
        ])

is good. This is what I get after the above modification test

sixftninja commented 4 years ago

Can you please explain what exactly is happening here? Also, I uploaded all necessary files to my google drive because I wanted to run the exact-mesh notebook but when I run the cell to search for tight bounds, the runtime restarts.

sixftninja commented 4 years ago

Adding --spheric_poses generated this gif. Looks good except the top part has been cut off and there's a strange cloud of white dust at one location.

kwea123 commented 4 years ago

this is the translation wrt poses center. the second line controls the height offset and the third line controls the distance offset

Yes like I said currently you need to manually tune the position as mentioned above, but it is only for visualization, for mesh extraction this code doesn't have effect.

sixftninja commented 4 years ago

ok will try the modification now.

sixftninja commented 4 years ago

for the camera above, colorless mesh extraction worked perfectly. However when I tried to extract a colored mesh this was the result

I found tight bounds at x,y: -0.4, 0.3 and z: -1.25, -0.55. I tried sigma threshold values from 5 to 45 in increments of 5. I tried occlusion threshold values from 0.05 to 0.2 in increments of 0.05.

What do you think is going wrong?

kwea123 commented 4 years ago

looks like the images are rotated by 90 degrees... can you try manually rotate them by +90 (or -90) then feed to the program?

sixftninja commented 4 years ago

Alright, will do that. I have faced this while reading iPhone captured images using Pillow and openCV, smh..

sixftninja commented 4 years ago

I fixed the EXIF data of images and ran the experiment again. The resulting colored mesh is still not satisfactory. Any advice?

1. Data Folder (Includes images and LLFF output files) 2. eval.py output 3. .ply file 4. Checkpoint 5. Colored mesh video

kwea123 commented 4 years ago

@3ventHoriz0n sorry, I misupdated the master code. I reverted it just now, please re-pull the code and retry the extract_mesh with the same parameters, it should give good results.

sixftninja commented 4 years ago

Done. Here's the final result.

dichen-cd commented 3 years ago

Hi @kwea123, thanks for your work!

it controls where the virtual camera is placed. This part is actually hard-coded currently, I'm still finding a way to let it adapt to various scenes.

I was wondering do you find any good way for adaptive render pose generation? Currently I find it quite hard to set correct poses manually, therefore I'm using the interpolated c2ws from the training set. It is working but the camera movement is not satisfactory (shaky, jittering speed etc.) Do you have any suggestions?

astogxl commented 3 years ago

@kwea123,hello! do you have read the paper pixelNeRF? I just cant understand the part of hardcoding for generating render pose for DTU dataset.