One issue regarding image rendering with pre-trained model

jinmaodaomaye2021 commented 2 years ago

Hi @ghy0324,

Thanks for sharing your pre-trained model.

I tested one model (0050_00.pth) to render images at training camera poses.

Please check the same image below. The left figure is the GT image and the right figure is the rendered image.

Screenshot from 2022-06-13 19-35-58

The rendered image is still very blurry using your pre-trained model. I only changed N_rays = 512 due to memory issues on my machine. Is it by-design to trade image loss (or L_img in Eq 5) for a better geometry ?

Any idea why this issue happens ?

Thanks.

ghy0324 commented 2 years ago

Hi! I think there are several reasons causing this problem.

Part of color images of ScanNet dataset are of serious motion blur, which is harmful to view synthesis.
There maybe a trade off between geometry and rendering, but it's not by-design. As shown in Table 1 in VolSDF paper, geometry of VolSDF is better than NeRF but the PSNR is slightly lower than NeRF, which is a little counterintuitive. And we noticed similar phenomenon in our experiments, that is, view synthesis quality of NeRF is slightly better than ours in training views (but ours is much better in novel views, as shown in Figure 7 in our paper).
It should be also related to the capacity of implicit neural representations, which maybe not able to handle appearance details of large scale scenes well.

jinmaodaomaye2021 commented 2 years ago

Thanks @ghy0324 for your reply.

1) The PSNR for VolSDF is slightly lower than NeRF, but the rendered images are visually close to GT without blur. 2) It is a little bit weird to me since these rendered images are at training poses instead of novel poses. The network should at least fit to the training data. There might be some losses which block other losses to converge (I am not sure) ? 3) Scannet images are blurry for some views. Does it become better to render training images with other non-blurry dataset ?

ghy0324 commented 2 years ago

Thanks @ghy0324 for your reply.

The PSNR for VolSDF is slightly lower than NeRF, but the rendered images are visually close to GT without blur.

It is a little bit weird to me since these rendered images are at training poses instead of novel poses. The network should at least fit to the training data. There might be some losses which block other losses to converge (I am not sure) ?

Scannet images are blurry for some views. Does it become better to render training images with other non-blurry dataset ?

Yes. The trade off exists, but is not the main reason.
When the scene is of large scale, it maybe hard for implicit neural representation to remember details of all training images. On dataset like DTU or LLFF, the field of view of one image can cover almost the whole scene. However, things are different on ScanNet dataset--there are much more views for each scene and the overlap among the views are much smaller. I don't think this is due to "some losses which block other losses to converge", since the results of NeRF in training views on ScanNet are also blurry, but NeRF is trained with image loss only.
You may try NeRF on Tanks and Temples dataset. TAT is also a dataset of large scale scenes, but without serious motion blur problem.

jinmaodaomaye2021 commented 2 years ago

Hi @ghy0324 ,

Thanks for your answer. Would you mind sharing the piece of code to generate 3D reconstruction using texture mapping as shown in your project ?

I am using the Open3D for texture mapping, but unfortunately it was not successful in Scannet.

email: jinyang717@gmail.com

Thanks.

ghy0324 commented 2 years ago

Hi @ghy0324 ,

Thanks for your answer. Would you mind sharing the piece of code to generate 3D reconstruction using texture mapping as shown in your project ?

I am using the Open3D for texture mapping, but unfortunately it was not successful in Scannet.

email: jinyang717@gmail.com

Thanks.

Hi! We recommend you to use mvs-texturing.

alan355 commented 6 months ago

嗨，

感谢您分享您的预训练模型。

我测试了一个模型（0050_00.pth）以在训练相机姿势下渲染图像。

请查看下面的同一张图片。左图是 GT 图像，右图是渲染图像。

使用预训练模型渲染的图像仍然非常模糊。我只是由于机器上的内存问题而更改的。用图像损失（或方程 5）换取更好的几何形状是否符合设计？N_rays = 512``L_img

知道为什么会发生这个问题吗？

谢谢。

How do you generate rendered images? I couldn't find the code for rendering images.

zju3dv / manhattan_sdf

One issue regarding image rendering with pre-trained model #14