The training effect of custom data is poor

cici19850 commented 1 week ago

Hello, thanks for the code. After setting up the environment, run the following command：

python dn_splatter/scripts/convert_colmap.py --image-path data\meetingroom\images --use-gpu

python dn_splatter/scripts/normals_from_pretrain.py --data-dir data\meetingroom --img-dir-name data\meetingroom\images

python dn_splatter/scripts/align_depth.py --data data\meetingroom

ns-train dn-splatter --pipeline.model.use-depth-loss True --pipeline.model.depth-loss-type PearsonDepth --pipeline.model.depth-lambda 0.2 --pipeline.model.use-normal-loss True --pipeline.model.use-normal-tv-loss True --pipeline.model.normal-supervision mono coolermap --data data\meetingroom --normals-from pretrained --normal-format opencv --load-depths True --load_normals True

The training effect is as follows，Is there any way to improve the effect？

maturk commented 1 week ago

Hey, it is difficult to say what could be a problem. The first thing I notice are the white walls, and these are quite difficult to regularize well with monocular depth regularization. How many images are in your dataset, and are they clear? (no motion blur)

Can you do a comparison with only splatfacto and compare with dn_splatter? Does dn_splatter do worse than splatfacto? You can get eval metrics using ns-eval command to see PSNR.

cici19850 commented 1 week ago

Hey, it is difficult to say what could be a problem. The first thing I notice are the white walls, and these are quite difficult to regularize well with monocular depth regularization. How many images are in your dataset, and are they clear? (no motion blur)

Can you do a comparison with only splatfacto and compare with dn_splatter? Does dn_splatter do worse than splatfacto? You can get eval metrics using ns-eval command to see PSNR.

Thank you very much for your prompt reply。 The training effect of splatfacto is as follows，There will also be many floating objects like fog。 Also, if I want to capture data similar to the Replica dataset, what equipment do I need to use for shooting。

maturk commented 1 week ago

Hi, I think maybe your "--normal-format" flag should be opengl, not opencv if using my scripts.

For debugging, I suggest first just enabling depth loss, and see if the PearsonDepth is improving the white walls or not. Then enable normal loss, and see if that issue is resolved.

Replica dataset is a "synthetic" dataset with ground truth depth, normal, and mesh data. It is unlikely that you will be able to make a similar quality dataset unless you have access to high precision laser scanners and your poses are millimiter good. If using a smartphone for capturing, make sure your images have little motion blur.

cici19850 commented 1 week ago

Hi, I think maybe your "--normal-format" flag should be opengl, not opencv if using my scripts.

For debugging, I suggest first just enabling depth loss, and see if the PearsonDepth is improving the white walls or not. Then enable normal loss, and see if that issue is resolved.

Replica dataset is a "synthetic" dataset with ground truth depth, normal, and mesh data. It is unlikely that you will be able to make a similar quality dataset unless you have access to high precision laser scanners and your poses are millimiter good. If using a smartphone for capturing, make sure your images have little motion blur.

Thank you very much, your reply has been very helpful to me. I will try using the advice you provided, thank you very much again.

cici19850 commented 1 week ago

Hi, I think maybe your "--normal-format" flag should be opengl, not opencv if using my scripts.

For debugging, I suggest first just enabling depth loss, and see if the PearsonDepth is improving the white walls or not. Then enable normal loss, and see if that issue is resolved.

Replica dataset is a "synthetic" dataset with ground truth depth, normal, and mesh data. It is unlikely that you will be able to make a similar quality dataset unless you have access to high precision laser scanners and your poses are millimiter good. If using a smartphone for capturing, make sure your images have little motion blur.

There is another issue, if using a smartphone for shooting and ensuring that the image has almost no motion blur. Is it correct to execute the following command sequence： 1、python dn_splatter/scripts/convert_colmap.py 2、python dn_splatter/scripts/normals_from_pretrain.py 3、python dn_splatter/scripts/align_depth.py 4、ns-train dn-splatter

cici19850 commented 1 week ago

Hi, I think maybe your "--normal-format" flag should be opengl, not opencv if using my scripts.

For debugging, I suggest first just enabling depth loss, and see if the PearsonDepth is improving the white walls or not. Then enable normal loss, and see if that issue is resolved.

Replica dataset is a "synthetic" dataset with ground truth depth, normal, and mesh data. It is unlikely that you will be able to make a similar quality dataset unless you have access to high precision laser scanners and your poses are millimiter good. If using a smartphone for capturing, make sure your images have little motion blur.

Will the following prompt have an impact?

maturk commented 1 week ago

@cici19850, the four commands you have ran are fine. You can skip step 1) if you use e.g. ns-process-data or some other tool to process your camera poses. 2) just gives you normal estimates and 3) converts COLMAP dataset SfM points to scale aligned mono-depth estimates. For more information about 3) I suggest looking at this paper which does the same thing (they use gradient descent to solve for scale and shift, but instead in my script I use the closed form solution). The "average depth alignment error for batch depths is..." warning relates to this step. It calculates how much the SfM points disagree with the monocular depth estimates.

You can also skip step 3) and only run the python dn_splatter/scripts/depth_from_pretrain.py command which only generates monocular depth estimates (using Zoe) and skips the Colmap alignment step. This is okay if you are using the PearsonDepth loss, which is a relative loss. For other loss functions, the scale alignment is necessary.

From my experience and experiments monocular depth supervision, even using PearsonDepth loss, does not perform as well as using e.g. iPhone ToF lidar data. Please see table below. So if you are hoping to make a very accurate (good geometry) indoor dataset, I highly recommend using a real depth sensor when capturing your scene. You can look at e.g. the MuSHRoom dataset for examples. This dataset was captured using iPhone camera with LiDAR.

maturk / dn-splatter

The training effect of custom data is poor #78