shubhtuls / drc

Code release for "Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency" (CVPR 2017)
https://shubhtuls.github.io/drc/
161 stars 32 forks source link

on a min-batch using only a depth map, loss can not decrease to 0? #8

Closed shilei-ustcer closed 5 years ago

shilei-ustcer commented 5 years ago

Great work, Shubham! When I run the codes on a single-view depth map, I test on a min-batch using only a depth map, I find that the loss can not decrease to 0, why?

shubhtuls commented 5 years ago

Hi, Thanks for the question! As the 3D representation is only a coarse voxel grid, even the best possible 3D representation can never lead a loss of exactly 0. This is because the depths rendered are at a much finer resolution, whereas the possible 'stopping depths' modeled in our loss for a ray traveling though a voxel grid is only a discrete set.

shilei-ustcer commented 5 years ago

Thank you for your reply! So here comes a situation: the loss between ground truth shape and tested depth map may be larger than some predicted shape, am I right?

shilei-ustcer commented 5 years ago

Supplement to above. Another situation: when neighbor pixels of depth map re-project to shape, neighbor rays may intersect in one voxel, thus the voxel state (to be 0 or 1) may conflict due to different depth signal with the two rays. This situation may happens.

shubhtuls commented 5 years ago

Hi, Thanks for raising these points! Both the points you state are correct, and in fact we discuss these a bit in our paper's appendix (sec Sec A2.2 in https://arxiv.org/pdf/1704.06254.pdf ).

shilei-ustcer commented 5 years ago

Hi, Another issue to bother again! Since rays re-project to shape, there is a situation: some voxel is not intersected by all the rays, so this voxel's state can not determined by depth map. I think this situation may also happen.

shubhtuls commented 5 years ago

Yes, that is correct, but it's not an issue if we are using this loss to train a prediction CNN - if the images yielded no evidence for the voxel, the gradients from the loss would also be 0. It may be an issue if we are directly trying to optimize the volume given a set of views, in which case you'd need to use enough views and/or assume some prior.

shilei-ustcer commented 5 years ago

As you say, " if the images yielded no evidence for the voxel, the gradients from the loss would also be 0", so the voxel's value stay fixed during training, it is only determined by weights initialization. So its value can be arbitrary, but we want it to be 0, empty. Is this an issue?

shubhtuls commented 5 years ago

Well, we train a common CNN across all images, so if the hope is that some image(s) across all training data would have provided evidence, so the CNN would have learned to predict reasonable values. If there exist voxels that across all training data that did not get any evidence, then perhaps it may be an issue.

shilei-ustcer commented 5 years ago

Yes, you are right. Thank you for your kindly reply. Since my problem has been solved, I will close this issue. Leans a lot from our discussion.