zjwzcx / GenNBV

[CVPR 2024] GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
https://gennbv.tech/
34 stars 0 forks source link

Questions about Supplementary Table 5 #2

Closed ALEX95GOGO closed 3 months ago

ALEX95GOGO commented 4 months ago

Thank you for your awesome work. I have a few questions regarding Supplementary Table 5. I probably misunderstand something there, so I hope you can help us figure it out/

  1. Can you explain the occupancy threshold mentioned? I assume this is not a probability, as it should not exceed 1. If it is a log-odds value, the figures seem quite small.

  2. Regarding the C' values—if these are in log odd, converting them back to probability suggests values over 0.9. Isn't that too high, making no difference between them?

I appreciate you taking the time to address these queries.

zjwzcx commented 4 months ago

Thanks for your attention! Here is my explanation :)

Q2: In section 1.1 of our supplementary material, we mentioned we’d update the log-odd occupancy probability of that voxel by adding the value of C, where $C=\log\frac{p(z_j | v_i = 1)}{p(z_j | v_i = 0)}$.

In fact, there are only two cases for the measurement event: $z_j = 0$ or $z_j = 1$, right? Thus, if the measurement event $z_j$ (i.g., the voxel is passed through by the $j^{th}$ camera ray) happens, we’ll update the occupancy by adding the value of $C_1=\log\frac{p(z_j = 1 | v_i = 1)}{p(z_j = 1 | v_i = 0)}$. If it’s not passed, we’ll add the value $C_2 = \log\frac{p(z_j = 0 | v_i = 1)}{p(z_j = 0 | v_i = 0)}$. Obviously, the values of $C_1$ and $C_2$ can be designed depending on the accuracy of ray casting or other factors. Actually, $C’ = |\frac{C_1}{C_2}|$. We set the high incremental value for $C'$ (i.e., high confidence) because our experiments are based on the realistic simulator.

Q1: Then we can set the threshold for cumulative C values to determine the occupancy state of each voxel.

Thanks for your issue. I’d like to update the detailed explanation to our newest arxiv version next week. If you have any questions, please let me know.

ALEX95GOGO commented 4 months ago

Hi, thank you for the quick response. I still have a few questions regarding this part.

The values for C' can be 5, 10, 20, and 40. It seems that using any one of these values can exceed the thresholds of 0.5, 1.0, 1.5, and 2.5. Therefore, I am a bit confused about why there are significant differences in the AUC, coverage ratio, and accuracy between these values.

image

zjwzcx commented 4 months ago

Hi, thank you for the quick response. I still have a few questions regarding this part.

The values for C' can be 5, 10, 20, and 40. It seems that using any one of these values can exceed the thresholds of 0.5, 1.0, 1.5, and 2.5. Therefore, I am a bit confused about why there are significant differences in the AUC, coverage ratio, and accuracy between these values.

image

Thanks for your question! There may be a misunderstanding: $C'$ is just the ratio representing the confidence/weight of each measurement event, which is different from the value $C$ in Eq. 1.

Let me take a simplified example. 1) Let occupancy_threshold=2.5 and $C'=20$, where $C_1=1.0$, $C_2=-0.05$. We initialize the occupancy state of a certain voxel as zero; 2) If we observe that one camera ray passes through this voxel, we'll update the occupancy state by adding $C_1=1.0$; 3) To determine the occupancy of the voxel, we need to observe the measurement event (i.e., the ray passes through the voxel) at least three times if we set occupancy threshold=2.5.

As for your question: In fact, the higher threshold requires denser scanning trajectories (to scan one area many times), ensures more accurate reconstruction results, and makes it more difficult to scan comprehensively.

However, the technical bottleneck of next-best-view planning for reconstruction is the completeness of scanning, instead of the density of scanning trajectory. Thus, we select the lower occupancy threshold in our framework.

ALEX95GOGO commented 3 months ago

Thank you very much for your answer! Our questions are solved.