Forward motion images in Autonomous Vehicle Datasets got bad results!

TurtleZhong commented 1 month ago

When I used the autonomous driving data set for training, I discovered a very strange phenomenon. If I use the front-view camera, the training results are very poor, but if I use the right-view image data, the training results It's going to be really good. My guess is that it is difficult to reconstruct the forward motion image itself, and there are too many point clouds in one fov. In addition, when I checked the results, I found that the points in the sky were learned very low, resulting in the appearance of Some ghosts. Below are some results I did using images from the pandaset 053 sequence. I only used 10 images with ground truth poses for testing because I used the entire sequence (a total of 80 images from one perspective) and found that the training results of the front-view camera were very bad. However, The results for the right-view image are very good.

The dataset I used is here: 053_front_10_images.zip 053_right_10_images.zip

the structure is like:

├── images
│   ├── 00.jpg
│   ├── 01.jpg
│   ├── 02.jpg
│   ├── 03.jpg
│   ├── 04.jpg
│   ├── 05.jpg
│   ├── 06.jpg
│   ├── 07.jpg
│   ├── 08.jpg
│   └── 09.jpg
└── sparse
    └── 0
        ├── cameras.bin
        ├── images.bin
        └── points3D.bin

I used these commands to train:

python3 train.py -s ../tandt_db/pandaset/053_front_10_images  --iterations 50000 --position_lr_init 0.000016 --scaling_lr 0.001 -r 1 -m output/053_front_10_image_colmap

and I got these log of front and right datasets

front-view

(gaussian_splatting) ➜  gaussian-splatting git:(main) ✗ python3 train.py -s ../tandt_db/pandaset/053_front_10_images  --iterations 50000 --position_lr_init 0.000016 --scaling_lr 0.001 -r 1 -m output/053_front_10_image_colmap 
Optimizing output/053_front_10_image_colmap
Output folder: output/053_front_10_image_colmap [11/05 16:21:25]
Tensorboard not available: not logging progress [11/05 16:21:25]
Reading camera 10/10 [11/05 16:21:25]
Converting point3d.bin to .ply, will happen only the first time you open the scene. [11/05 16:21:25]
not found normals [11/05 16:21:25]
Loading Training Cameras [11/05 16:21:25]
Loading Test Cameras [11/05 16:21:27]
Number of points at initialisation :  3898 [11/05 16:21:27]
Training progress:  14%|███████▏                                           | 7000/50000 [04:05<26:34, 26.97it/s, Loss=0.0496117]
[ITER 7000] Evaluating train: L1 0.021970783546566966 PSNR 26.591869354248047 [11/05 16:25:32]

[ITER 7000] Saving Gaussians [11/05 16:25:32]
Training progress:  60%|██████████████████████████████                    | 30000/50000 [20:48<14:46, 22.56it/s, Loss=0.0316958]
[ITER 30000] Evaluating train: L1 0.016803048923611643 PSNR 28.780955505371097 [11/05 16:42:16]

[ITER 30000] Saving Gaussians [11/05 16:42:16]
Training progress: 100%|██████████████████████████████████████████████████| 50000/50000 [36:19<00:00, 22.94it/s, Loss=0.0299650]

[ITER 50000] Saving Gaussians [11/05 16:57:46]

right-view

gaussian-splatting git:(main) ✗ python3 train.py -s ../tandt_db/pandaset/053_right_10_images  --iterations 50000 --position_lr_init 0.000016 --scaling_lr 0.001 -r 1 -m output/053_right_10_image_colmap    
Optimizing output/053_right_10_image_colmap
Output folder: output/053_right_10_image_colmap [11/05 15:18:29]
Tensorboard not available: not logging progress [11/05 15:18:29]
Reading camera 10/10 [11/05 15:18:29]
Converting point3d.bin to .ply, will happen only the first time you open the scene. [11/05 15:18:29]
not found normals [11/05 15:18:29]
Loading Training Cameras [11/05 15:18:29]
Loading Test Cameras [11/05 15:18:31]
Number of points at initialisation :  4827 [11/05 15:18:31]
Training progress:  14%|███████▏                                           | 7000/50000 [03:44<22:15, 32.20it/s, Loss=0.0206332]
[ITER 7000] Evaluating train: L1 0.011953955702483655 PSNR 33.26549453735352 [11/05 15:22:15]

[ITER 7000] Saving Gaussians [11/05 15:22:15]
Training progress:  60%|██████████████████████████████                    | 30000/50000 [16:17<11:02, 30.20it/s, Loss=0.0133656]
[ITER 30000] Evaluating train: L1 0.00913937296718359 PSNR 36.359527587890625 [11/05 15:34:48]

[ITER 30000] Saving Gaussians [11/05 15:34:48]
Training progress: 100%|██████████████████████████████████████████████████| 50000/50000 [27:50<00:00, 29.94it/s, Loss=0.0118314]

[ITER 50000] Saving Gaussians [11/05 15:46:21]

Training complete. [11/05 15:46:23]

The following videos are visualizations of the training results of the front-view camera and the right-view camera in the same scene.

https://github.com/graphdeco-inria/gaussian-splatting/assets/19700579/27c746de-477d-4a35-af45-f3e3eb36e102

https://github.com/graphdeco-inria/gaussian-splatting/assets/19700579/857bd0aa-8564-4229-880a-82784d3a04e0

I also tried train the GS using the lidar pointclouds as prior, but I got the same results that the front-view got bad results. So what I want to know is there any way to make the training results of forward motion better?

jaco001 commented 1 month ago

Put them both at the same training and put all pictures. At last this green hause should be better. More you cover from both and more views - better will be result cus algorithms will have more precise positioning in the space. (less floaters and better interpolation)

TurtleZhong commented 1 month ago

@jaco001 despite of the green house, I found the render results of vehicles in two views are really different. In the right-view, may be more views of the vehicle, so the results of vehicle is really good even if moving render-view around. But in the front-view case, the vehicles on the road sides seems blur or floters even if the render-view is the same pose with the training poses. I also tried put them both to train but the results of vehicles are still bad…:sob:

jaco001 commented 1 month ago

With the front view you have more transformation like scaling and more dynamic angles. So there is more potential degradation. If you don't compensate it with eg 45 camera to the right nothing change. If you don't put late right side pictures nothing change too. You must cross each other cameras more. Right side is less 'broken' cuz you only move camera with moving vector. No scaling. If you move camera forward more - your view will be broken too like front view.

graphdeco-inria / gaussian-splatting

Forward motion images in Autonomous Vehicle Datasets got bad results! #804