Single image input for NeRF

ken2576 / vision-nerf

Official PyTorch Implementation of paper "Vision Transformer for NeRF-Based View Synthesis from a Single Input Image", WACV 2023.

MIT License

107 stars 12 forks source link

Single image input for NeRF #4

Closed gemyerst closed 1 year ago

gemyerst commented 1 year ago

Hi there! Really cool model, managed to get the model working on my own input images but I've had to resort to a bit of workaround to get there.

I've been trying to run your model on my own data, and don't seem to be able to get the model to take in a single image as an input as described in the paper. The only way I've found the model to work is by duplicating the input image 100 times and adding a set of poses from the training SRN files.

This is the error code I get when running the SRN, NMR and gen_real models on a single image:

I also tried preparing the data as suggested by using Pixel-NeRF's method, and managed to get detectron working but not Pixel-NeRF itself, which are both required to prepare the data as suggested. Would you be able to clarify the format of the input data for the Vision-NeRF model?

ken2576 commented 1 year ago

Hi,

I originally set up gen_real.py to be used right after you perform PixelNeRF's preprocessing. For example, you have car1.jpg, car2.jpg and after running the said preprocessing, it would give you a cropped car image car1_normalize.jpg and car2_normalize.jpg. After that, you can input the path to the directory and the following line should pick up the images. https://github.com/ken2576/vision-nerf/blob/main/gen_real.py#L113

The intrinsics will be fed to the script as an argument. However, it depends on what dataset it was trained. E.g. for car images, it only works for the SRN training settings. focal=131.25, img_hw=[128, 128], z_near=0.8, and z_far=1.8. If you would like different focal lengths or image sizes, then it is necessary to retrain the whole network.

gemyerst commented 1 year ago

Thank you for your response, that works really well!! Are we able to feed just one image into the srn model as well, or could I change the expected inputs from here: https://github.com/ken2576/vision-nerf/blob/c184501fc5609382ba79937ffbcd479a16a624e3/eval.py#L159

I am planning to either retrain or do trasfer learning on the network using images of buildings, and have followed the SRN dataset creation process for this.

ken2576 commented 1 year ago

Yes, you can change that part if it's easier. Just make sure the returned dictionary in __getitem__ is the same then you don't have to change other parts of the code.