mlvlab / UP-NeRF

Official Implementation (PyTorch) of "UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields", NeurIPS 2023

MIT License

22 stars 3 forks source link

Regarding the depth shown in Figure 7 in the paper #3

Closed TwiceMao closed 4 months ago

TwiceMao commented 4 months ago

Is it depth map or inverse depth map that you show in your paper? Why isn't it represented in presudo color?Thanks！

7317b0c2e0e6e184290b6224900cced

TwiceMao commented 4 months ago

@cmh1027 And for an indoor scene, how many iterations do you think is appropriate? Because it takes me a long, long time to run your code. Thanks！

cmh1027 commented 4 months ago

If my memory serves me right, the visualization is inverse depth (idk what you mean by pseudo color) For the runtime, one iteration equals randomly shuffled batch (default 2048) rays across all the images (you can refer to the dataset code) It means that along 600k iterations, 600k * 2048 rays (from each pixel) will be used for training, and phototourism datasets have about 700~1500 images. Depending on the number of images in your dataset, I think the number of iterations can be reduced.

TwiceMao commented 4 months ago

Representing the depth map with pseudo color means mapping the original depth value of one channel to three channels.

From SPARF: Neural Radiance Fields from Sparse and Noisy Poses

TwiceMao commented 4 months ago

Our dataset is an indoor scene with 48 images. When using your method, the trained psnr is reasonable, but the camera pose rotation error is too large. Of course, it is not ruled out that problems may arise when we modify the code. But I would like to ask if there are any parameters that can be adjusted in your code for better performance? If it is convenient, please give me some suggestions!

TwiceMao commented 4 months ago

We discovered a phenomenon when running your test code (tto code). It takes 33 hours to test 48 images. We felt this was a little too long. I'm not sure if we made a mistake somewhere. Moreover, you are optimizing one picture at a time rather than all together. We don't have so much time. Can you give me some advice?

cmh1027 commented 4 months ago

The design that optimizes one image at a time in tto is because we want to ensure than batch size (2048) is allocated only one image at a time. If we optimize all the images in a batch, then batch size per image must be reduced to 2048/N (of course, batch size of 2048*N is not feasible). Meanwhile, I think test-time optimization is unnecessary in your case. If your purpose is just estimating the camera poses of images, you can just put them all in training set.

cmh1027 commented 4 months ago

And the reason for large rotation error is because our model is not good at dealing with a dataset with not enough views, or the camera poses of your dataset can be too hard for our model. (You can tune the hyperparameters like learning rate and the scheduling parameters, but I don't think they work for your case)

I recommend you to check the refined_pose figure uploaded to wandb to check if each camera pose converges to its own GT position.

TwiceMao commented 4 months ago

I didn't add it completely. I extracted several images with known poses from the entire data set as a test set. The training set/test set is 48/6. I hope to use test set to render from a new perspective and compare the performance with your method. We compare the optimization effect of camera pose in the training set, and then compare the effect of new perspective rendering in the test set. So do you think there is any way to reduce the running time of tto without affecting the performance of your method?

cmh1027 commented 4 months ago

I'm afraid that it is not possible :'( Why don't you just run TTO for only 6 images rather than whole images? Evaluation of PSNR and camera poses are already one on training time.

TwiceMao commented 4 months ago

Why don't you just run TTO for only 6 images rather than whole images?

Yes，I run TTO for the test dataset， i.e., 6 images, it still takes many hours to run your tto. Maybe it's because I didn't express myself clearly. I divided the data set into a training set and a test set, 48/6. I test the camera pose during the training set. Test PSNR when doing new perspective synthesis on the test set. Although the test set only has 6 images, it still takes many hours to run your tto.

cmh1027 commented 4 months ago

I'm not sure if it reduces much time or not, but you can skip appearance test optimization in that your dataset would be color-consistent. (TTO has two stages, which are pose and appearacne for each)

TwiceMao commented 4 months ago

Thank you very much! We will try~