ken2576 / vision-nerf

Official PyTorch Implementation of paper "Vision Transformer for NeRF-Based View Synthesis from a Single Input Image", WACV 2023.
MIT License
107 stars 12 forks source link

Confusion about generalization of NeRF #14

Closed xuyaojian123 closed 4 months ago

xuyaojian123 commented 10 months ago

Thank your great work! I have some confusion about NeRF generalizability.

Your paper title says that only need a single image to synthesize novel image, and And what is the function of the pre-training weights you provide? Pre-training weights how you get them?

Are the pre-training weights used to extract the global and local features of a single input image, and then use NeRF MLP to obtain target view?

The original NeRF needs to input dozens to hundreds of pictures of a scene, and after training, it can generate any new perspective of the scene. Although you only input a single image, you train a network on the image data set to extract global features and local features. What is the difference between input many images in this original nerf?

Sorry, I don't understand the generalizability of NeRF, I'd appreciate your reply, thanks!

ken2576 commented 9 months ago

Hi,

Thanks for the question. Vanilla NeRF takes xyz and viewing directions as input to the MLP. In our case, it is more similar to PixelNeRF, where the MLP takes per-pixel image features in addition to the xyz coordinates and viewing directions. You can think of the image features as additional embeddings to tell the NeRF MLP what each 3D point should look like. Please let me know if you still have other questions.