kcheng1021 / GaussianPro

[ICML2024] Official code for GaussianPro: 3D Gaussian Splatting with Progressive Propagation
https://kcheng1021.github.io/gaussianpro.github.io/
MIT License
551 stars 36 forks source link

question about normal_loss and depth_loss #9

Closed cv-lab-x closed 3 months ago

cv-lab-x commented 3 months ago

hi, thanks for your great work, did you test normal loss and depth loss based on mono depth and mono normal ? what's the results? I found that you load depth on normal in your codes. if load_normal: normal_path = image_path.replace("images", "normals")[:-4]+".npy" normal = np.load(normal_path).astype(np.float32) normal = (normal - 0.5) * 2.0 else: normal = None

   ' if load_depth:
        # depth_path = image_path.replace("images", "monodepth")[:-4]+".npy"
        depth_path = image_path.replace("images", "metricdepth")[:-4]+".npy"
        depth = np.load(depth_path).astype(np.float32)
    else:
        depth = None`

Looking forward to your reply, thanks @kcheng1021

kcheng1021 commented 3 months ago

Yes, we have attempted to use monocular depth and normals directly, but this experiment was only tested on one Waymo scene. The monocular depth still require scale alignment with Gaussian-rendered depth for supervision. In cases where the initially rendered depth has significant errors, alignment can only be achieved using sparse depth from SFM (Structure from Motion). Additionally, direct monocular depth supervision cannot help grow correct Gaussians in incomplete regions and there is still a possibility of fitting depth at incorrect positions. Therefore, the direct monocular depth supervision has limited impact. Monocular normal supervision can provide good constraints for large planes. However, the monocular normals learned from the network may be oversmoothed. Additionally, it is important to consider how to select regions with good multi-view consistency.

cv-lab-x commented 3 months ago

Yes, we have attempted to use monocular depth and normals directly, but this experiment was only tested on one Waymo scene. The monocular depth still require scale alignment with Gaussian-rendered depth for supervision. In cases where the initially rendered depth has significant errors, alignment can only be achieved using sparse depth from SFM (Structure from Motion). Additionally, direct monocular depth supervision cannot help grow correct Gaussians in incomplete regions and there is still a possibility of fitting depth at incorrect positions. Therefore, the direct monocular depth supervision has limited impact. Monocular normal supervision can provide good constraints for large planes. However, the monocular normals learned from the network may be oversmoothed. Additionally, it is important to consider how to select regions with good multi-view consistency.

thanks for your reply!

pablovela5620 commented 3 months ago

Yes, we have attempted to use monocular depth and normals directly, but this experiment was only tested on one Waymo scene. The monocular depth still require scale alignment with Gaussian-rendered depth for supervision. In cases where the initially rendered depth has significant errors, alignment can only be achieved using sparse depth from SFM (Structure from Motion). Additionally, direct monocular depth supervision cannot help grow correct Gaussians in incomplete regions and there is still a possibility of fitting depth at incorrect positions. Therefore, the direct monocular depth supervision has limited impact. Monocular normal supervision can provide good constraints for large planes. However, the monocular normals learned from the network may be oversmoothed. Additionally, it is important to consider how to select regions with good multi-view consistency.

I'm also quite interested in this, considering patch match and NCC are already used in implemented for this method, I'm wondering if your team attempted using what the NeuRIS paper described for surface normals and region selection. Exactly how you described for ensuring good multiview consistency. Screenshot from 2024-03-05 09-27-09