jingsenzhu / i2-sdf

[CVPR 2023] I^2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
MIT License
184 stars 10 forks source link

what's depth/normal supervision ? #1

Open jinmaodaomaye2021 opened 1 year ago

jinmaodaomaye2021 commented 1 year ago

Hi,

Great work. I have questions regarding equation 9) in the paper. What are the depth/normal supervision ? specifically, what are D(r) and N(r) in 11) and 12) respectively.

1) monosdf uses public models to generate depth maps and normal maps to supervise the model. Just curious how to generate depth/normal supervision for i2sdf. 2) Do i2sdf and monosdf share the same depth/normal supervision ?

Thanks

jingsenzhu commented 1 year ago

Thanks for your reply!

The D(r) and N(r) are ground truth depth and normal maps. As described in "dataset" paragraph of Sec.7, we test our method on our synthetic dataset and some real dataset. For synthetic dataset, the ground truth depth and normal maps are readily available from the renderer, while for real dataset, ground truths are predicted similar to MonoSDF.

As for MonoSDF, we share the same source of depth and normal maps as ours in supervision to ensure fairness.

We will make our synthetic dataset publicly available soon. See our supplementary material for dataset details.

jinmaodaomaye2021 commented 1 year ago

Thanks. Additional questions. 1) what about geometry reconstruction performance on scannet? The paper only provides comparison on synthetic data. 2) does i2sdf perform the same procedure as monosdf to solve the scale and shift when using public model depth as supervision.?

jingsenzhu commented 1 year ago
  1. As described in Sec.7, scannet suffers from inaccurate camera calibration, erroneous depth capture and low image quality (such as motion blur), which will crucially affect the reconstruction quality. Instead, we evaluate our method on other real-world scenes, which has a higher quality than scannet. Our synthetic dataset also aims to provide a higher quality indoor scene reconstruction benchmark.
  2. For ground truth depth maps with correct scales, scaling and shifting is no longer required.
jinmaodaomaye2021 commented 1 year ago

Thanks. 1) any quantitative evaluatiom on other real datasets? just curious how false positives on depth maps impact the bubble loss design.

jingsenzhu commented 1 year ago

Empirically, bubble loss has robustness against false positives to some extent, because of the smooth step in training (Sec.6). In our ablation studies, we add noises to our depth maps to simulate false positives.

However, in contrast, bubble loss will indeed be impacted by false negatives. For example, if the depth map provided from dataset misses a chair leg, our method may struggle in reconstructing the chair leg. We leave this as a future work.

jinmaodaomaye2021 commented 1 year ago

yes, thats why we want to see the impact of depth map in real dataset. Though the abalation study has synthetic noise on depth maps, it is hard to compare with noise introduced by offshelf depth estimation models. Ground truth depth is not available in practice.

jinmaodaomaye2021 commented 1 year ago

btw, how to generate the reconstruction result in fig 1? just want to learn to generate textured meshes.

jingsenzhu commented 1 year ago
  1. In real-world scenarios, with appropriate calibration techniques, the estimated depth maps are still sufficient for reconstruction. For example, Fig.4 displays a real-world scene where our method succeeds in reconstructing the lamp pole. The key to precise reconstruction is the elimination of false negatives instead of false positives.
  2. The reconstruction result in Fig.1 is not a textured mesh but a neural rendering result. I'll replace the legend to resolve misunderstanding. However, since our method also decomposes material parameters, I believe that a textured mesh can be generated by attaching the predicted albedo to each vertex during the marching cube process.
jingsenzhu commented 1 year ago

Close this issue due to inactivity. Re-open it if you have further questions.

Jerrypiglet commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss?

Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

jingsenzhu commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss?

Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting.

The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

Jerrypiglet commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss? Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting.

The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

That's great news! Looking forward to the release of the real scene with calibrated depth. Also wondering if it is possible to release the tools to get dense MVS depth for those scenes (and third-party scenes)?

jingsenzhu commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss? Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting. The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

That's great news! Looking forward to the release of the real scene with calibrated depth. Also wondering if it is possible to release the tools to get dense MVS depth for those scenes (and third-party scenes)?

They used CapturingReality to calibrate their scenes (reported in their paper), but I am not quite familiar with this field :)

Jerrypiglet commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss? Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting. The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

That's great news! Looking forward to the release of the real scene with calibrated depth. Also wondering if it is possible to release the tools to get dense MVS depth for those scenes (and third-party scenes)?

They used CapturingReality to calibrate their scenes (reported in their paper), but I am not quite familiar with this field :)

Thanks! Also just to confirm, in order to get depth/normal maps on real scenes, did you rasterize with their provided mesh and poses? Scenes from Free-viewpoint do not include depth maps or normal maps; they only provide meshes and poses.

jingsenzhu commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss? Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting. The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

That's great news! Looking forward to the release of the real scene with calibrated depth. Also wondering if it is possible to release the tools to get dense MVS depth for those scenes (and third-party scenes)?

They used CapturingReality to calibrate their scenes (reported in their paper), but I am not quite familiar with this field :)

Thanks! Also just to confirm, in order to get depth/normal maps on real scenes, did you rasterize with their provided mesh and poses? Scenes from Free-viewpoint does not provide depth maps or normal maps; they only provide meshes and poses.

Depth map can be directly acquired from the MVS tools, or a rasterization is also OK I think I think evaluating normal maps using monocular learning-based methods (like MonoSDF or NeuRIS does) is more precise than the normal from MVS, the latter contains lots of noise

Jerrypiglet commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss? Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting. The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

That's great news! Looking forward to the release of the real scene with calibrated depth. Also wondering if it is possible to release the tools to get dense MVS depth for those scenes (and third-party scenes)?

They used CapturingReality to calibrate their scenes (reported in their paper), but I am not quite familiar with this field :)

Thanks! Also just to confirm, in order to get depth/normal maps on real scenes, did you rasterize with their provided mesh and poses? Scenes from Free-viewpoint does not provide depth maps or normal maps; they only provide meshes and poses.

Depth map can be directly acquired from the MVS tools, or a rasterization is also OK I think I think evaluating normal maps using monocular learning-based methods (like MonoSDF or NeuRIS does) is more precise than the normal from MVS, the latter contains lots of noise

Yeah I agree. Just want to confirm which option was used in real scene experiments in I^2-SDF? Did you use semi-dense MVS depth by feeding images into a MVS pipeline, or rasterized depth/normals with the provided mesh?

W.r.t. to monocular depth (e.g. DPT depth in MonoSDF), I don't see the current code of I^2-SDF supporting scale/shift-invariant depth loss. I will try out DPT depth/normals with bubble loss but I am not sure whether things will just automatically work out if I plug in DPT depth/normals, or changes need to be made to the losses.

jingsenzhu commented 1 year ago

Hi there, a follow-up question: for real data (from [26] and [40]), are the depth maps acquired by rasterizing the provided mesh, or from off-the-shelf depth estimation models (e.g. DPT in MonoSDF)? And do those depth maps have correct absolute scale, or they are ambiguous in scale/shift and you will need to a shift/scale invariant depth loss? Also can you explain what are the 1+3 real scenes from [26] and [40]? At the bottom of the website of [40], they only list one scene by them (Living Room2, which I cannot find download link to) and two from Free-viewpoint (Living Room1 and Sofa)?

Real data's depths are estimated from MVS tools, so the depth maps have correct absolute scale and does not need scale shifting. The real scenes of [26] and [40] are all calibrated by the authors of [40] (for the living room scene from [26], the re-calibration provides more precise depth and camera compared to the original version) in their experiments. They haven't made their dataset public yet, and I've been asking them for permissions to release their data I used in this repository.

That's great news! Looking forward to the release of the real scene with calibrated depth. Also wondering if it is possible to release the tools to get dense MVS depth for those scenes (and third-party scenes)?

They used CapturingReality to calibrate their scenes (reported in their paper), but I am not quite familiar with this field :)

Thanks! Also just to confirm, in order to get depth/normal maps on real scenes, did you rasterize with their provided mesh and poses? Scenes from Free-viewpoint does not provide depth maps or normal maps; they only provide meshes and poses.

Depth map can be directly acquired from the MVS tools, or a rasterization is also OK I think I think evaluating normal maps using monocular learning-based methods (like MonoSDF or NeuRIS does) is more precise than the normal from MVS, the latter contains lots of noise

Yeah I agree. Just want to confirm which option was used in real scene experiments in I^2-SDF? Did you use semi-dense MVS depth by feeding images into a MVS pipeline, or rasterized depth/normals with the provided mesh?

W.r.t. to monocular depth (e.g. DPT depth in MonoSDF), I don't see the current code of I^2-SDF supporting scale/shift-invariant depth loss. I will try out DPT depth/normals with bubble loss but I am not sure whether things will just automatically work out if I plug in DPT depth/normals, or changes need to be made to the losses.

I current haven't tested I^2-SDF on monocular depths. All depth maps I used in my experiments are absolute depths. By the way, I will release the real data I used these two days maybe. For monocular depths with bubble loss, one thing I worry about is that the thin structures (e.g. chandeliers) in the monocular depth may only seem visually correct instead of scale correct. With a scale shifting (e.g. via least square method), the projected point cloud from the monocular depth may result in an erroneous area. But after all you can try it first.

h8c2 commented 1 year ago

Hi, i was confused about the statement that the bubble loss breaks the stable status of the converged SDF field so far. Why can't we merge the bubble step and smooth step into one step?