Closed TwiceMao closed 4 months ago
@cmh1027 And for an indoor scene, how many iterations do you think is appropriate? Because it takes me a long, long time to run your code. Thanks!
If my memory serves me right, the visualization is inverse depth (idk what you mean by pseudo color) For the runtime, one iteration equals randomly shuffled batch (default 2048) rays across all the images (you can refer to the dataset code) It means that along 600k iterations, 600k * 2048 rays (from each pixel) will be used for training, and phototourism datasets have about 700~1500 images. Depending on the number of images in your dataset, I think the number of iterations can be reduced.
Representing the depth map with pseudo color means mapping the original depth value of one channel to three channels.
From SPARF: Neural Radiance Fields from Sparse and Noisy Poses
Our dataset is an indoor scene with 48 images. When using your method, the trained psnr is reasonable, but the camera pose rotation error is too large. Of course, it is not ruled out that problems may arise when we modify the code. But I would like to ask if there are any parameters that can be adjusted in your code for better performance? If it is convenient, please give me some suggestions!
We discovered a phenomenon when running your test code (tto code). It takes 33 hours to test 48 images. We felt this was a little too long. I'm not sure if we made a mistake somewhere. Moreover, you are optimizing one picture at a time rather than all together. We don't have so much time. Can you give me some advice?
The design that optimizes one image at a time in tto is because we want to ensure than batch size (2048) is allocated only one image at a time. If we optimize all the images in a batch, then batch size per image must be reduced to 2048/N (of course, batch size of 2048*N is not feasible). Meanwhile, I think test-time optimization is unnecessary in your case. If your purpose is just estimating the camera poses of images, you can just put them all in training set.
And the reason for large rotation error is because our model is not good at dealing with a dataset with not enough views, or the camera poses of your dataset can be too hard for our model. (You can tune the hyperparameters like learning rate and the scheduling parameters, but I don't think they work for your case)
I recommend you to check the refined_pose figure uploaded to wandb to check if each camera pose converges to its own GT position.
I didn't add it completely. I extracted several images with known poses from the entire data set as a test set. The training set/test set is 48/6. I hope to use test set to render from a new perspective and compare the performance with your method. We compare the optimization effect of camera pose in the training set, and then compare the effect of new perspective rendering in the test set. So do you think there is any way to reduce the running time of tto without affecting the performance of your method?
I'm afraid that it is not possible :'( Why don't you just run TTO for only 6 images rather than whole images? Evaluation of PSNR and camera poses are already one on training time.
Yes,I run TTO for the test dataset, i.e., 6 images, it still takes many hours to run your tto. Maybe it's because I didn't express myself clearly. I divided the data set into a training set and a test set, 48/6. I test the camera pose during the training set. Test PSNR when doing new perspective synthesis on the test set. Although the test set only has 6 images, it still takes many hours to run your tto.
I'm not sure if it reduces much time or not, but you can skip appearance test optimization in that your dataset would be color-consistent. (TTO has two stages, which are pose and appearacne for each)
Thank you very much! We will try~
Is it depth map or inverse depth map that you show in your paper? Why isn't it represented in presudo color?Thanks!