hwjiang1510 / LEAP

[ICLR 2024] Code for LEAP: Liberate Sparse-view 3D Modeling from Camera Poses
172 stars 6 forks source link

Questions about training #1

Closed crepejung00 closed 1 year ago

crepejung00 commented 1 year ago

Hi, Thank you for your incredible work! I have one question regarding the training process of how to learn the 2D convolution to render images. In the paper in Section 4.1 Neural Rendering: In detail, we first render a feature map and predict the rendered image using 2D convolutions. In Section 4.2 Training: We only use L_i of the final loss term L = L_i + L_M if the masks are not available. As far as I understand, it seems as if the masks are available, the photometric loss will not be used, which will cause the 2D convolution to learn nothing. In this case, how is the images rendered in inference time?

Thank you for your time!

hwjiang1510 commented 1 year ago

Hi Jaewoo,

Thanks for your interest in our work.

For the loss function, the photometric loss I_i will always be used. If masks are available, we use both photometric loss and mask loss. For example, on Omniobject3d, Kubric, and Objaverse datasets, we use L=L_i+L_M, as the masks are provided; On DTU, we use L = L_i, as the scene-level dataset doesn't have masks.

crepejung00 commented 1 year ago

Oh okay, I misunderstood the sentence to the meaning that "L_i" will not be used. Thank you for your explanation.

Thanks!