ken2576 / vision-nerf

Official PyTorch Implementation of paper "Vision Transformer for NeRF-Based View Synthesis from a Single Input Image", WACV 2023.
MIT License
107 stars 12 forks source link

Generate Multi-level Feature Maps #16

Closed Yancy-lv closed 4 months ago

Yancy-lv commented 8 months ago

Hello, may I ask what is the difference between using ViT Encoder and Convolutional Decoder to generate Multi-level Feature Maps, and using PVT Encoder to generate Multi-level Feature Maps directly?

ken2576 commented 4 months ago

Hi

I didn't specifically test PVT encoder, but the idea to use a convolutional decoder is to preserve the color information from the source images. We found that only using the ViT encoder could cause blurriness in some areas.