zju3dv / manhattan_sdf

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral
https://zju3dv.github.io/manhattan_sdf/
Other
491 stars 35 forks source link

Confusion about Eq. 13 #58

Closed ArlenCHEN closed 9 months ago

ArlenCHEN commented 9 months ago

Dear authors,

Thanks for the brilliant work!

I have one question about the Eq. 13. You mentioned in the paper that

To decrease $\hat{p}_f(r)L_f(r)$, the gradient will push $\hat{p}_f(r)$ to be small,

on which I have my agreement. But reducing the $\hat{p}_f(r)$ does not mean that 'the semantic label is optimized'.

The reason is $\hat{p}_f(r)$ here represents the confidence score, which should approach to 1 as optimization goes. This is not consistent with your statement of 'pushing $\hat{p}_f(r)$ to be small is optimizing the semantic label'.

Please correct me if I am wrong. Thanks!

ArlenCHEN commented 9 months ago

Following my above confusion, wouldn't a better joint loss be an Eq. 13 with the weights being replaced with $(1-\hat{p}_f(\mathbf{r}))$ and $(1-\hat{p}_w(\mathbf{r}))$?

ghy0324 commented 9 months ago

Hi! Thanks for your interest!

‘Optimizing the semantic label’ in this example means to correct the semantics of the region which is misclassifed to floor region (namely, not floor region, but with high p_f). So p_f should approach to 0 instead of 1 as optimization goes.

For a region with high p_f, if it is really floor region, L_f will be reduced easily. Otherwise, L_f cannot be reduced easily, so the training process will reduce p_f to decrease the joint optimization loss (p_f * L_f).

Hope this is helpful to you. Feel free to discuss if you have any further question.