Closed cv-lab-x closed 3 months ago
Yes, we have attempted to use monocular depth and normals directly, but this experiment was only tested on one Waymo scene. The monocular depth still require scale alignment with Gaussian-rendered depth for supervision. In cases where the initially rendered depth has significant errors, alignment can only be achieved using sparse depth from SFM (Structure from Motion). Additionally, direct monocular depth supervision cannot help grow correct Gaussians in incomplete regions and there is still a possibility of fitting depth at incorrect positions. Therefore, the direct monocular depth supervision has limited impact. Monocular normal supervision can provide good constraints for large planes. However, the monocular normals learned from the network may be oversmoothed. Additionally, it is important to consider how to select regions with good multi-view consistency.
Yes, we have attempted to use monocular depth and normals directly, but this experiment was only tested on one Waymo scene. The monocular depth still require scale alignment with Gaussian-rendered depth for supervision. In cases where the initially rendered depth has significant errors, alignment can only be achieved using sparse depth from SFM (Structure from Motion). Additionally, direct monocular depth supervision cannot help grow correct Gaussians in incomplete regions and there is still a possibility of fitting depth at incorrect positions. Therefore, the direct monocular depth supervision has limited impact. Monocular normal supervision can provide good constraints for large planes. However, the monocular normals learned from the network may be oversmoothed. Additionally, it is important to consider how to select regions with good multi-view consistency.
thanks for your reply!
Yes, we have attempted to use monocular depth and normals directly, but this experiment was only tested on one Waymo scene. The monocular depth still require scale alignment with Gaussian-rendered depth for supervision. In cases where the initially rendered depth has significant errors, alignment can only be achieved using sparse depth from SFM (Structure from Motion). Additionally, direct monocular depth supervision cannot help grow correct Gaussians in incomplete regions and there is still a possibility of fitting depth at incorrect positions. Therefore, the direct monocular depth supervision has limited impact. Monocular normal supervision can provide good constraints for large planes. However, the monocular normals learned from the network may be oversmoothed. Additionally, it is important to consider how to select regions with good multi-view consistency.
I'm also quite interested in this, considering patch match and NCC are already used in implemented for this method, I'm wondering if your team attempted using what the NeuRIS paper described for surface normals and region selection. Exactly how you described for ensuring good multiview consistency.
hi, thanks for your great work, did you test normal loss and depth loss based on mono depth and mono normal ? what's the results? I found that you load depth on normal in your codes.
if load_normal: normal_path = image_path.replace("images", "normals")[:-4]+".npy" normal = np.load(normal_path).astype(np.float32) normal = (normal - 0.5) * 2.0 else: normal = None
Looking forward to your reply, thanks @kcheng1021