pointrix-project / msplat

A modular differential gaussian rasterization library.
Other
170 stars 10 forks source link

Overfitting occurred when I trained with a small number of images. #7

Closed Yuhuoo closed 6 months ago

Yuhuoo commented 6 months ago

When I used this msplat to train the Horse in Tanks & Temples dataset, I selected 24 images, with 12 for training and 12 for testing. I intended to use this project for optimization of camera poses. I found that after iterating over 3000 times, the loss started to increase. However, this situation did not occur in the native Gaussian project of https://github.com/graphdeco-inria/gaussian-splatting.

log of msplat:

(gaussian_splatting) aogao@test-X640-G40:~/code/dust_gs/gaussian-splatting$ python train.py -s /home/aogao/code/dust_gs/gaussian-splatting/data/Horse_24 -m /home/aogao/code/dust_gs/gaussian-splatting/output/InstantSplat/Horse_24_2 --ite
ration=7000 --eval
Optimizing /home/aogao/code/dust_gs/gaussian-splatting/output/InstantSplat/Horse_24_2
Output folder: /home/aogao/code/dust_gs/gaussian-splatting/output/InstantSplat/Horse_24_2 [19/05 20:52:09]
Reading camera 24/24 [19/05 20:52:09]
Generating ellipse path from 24 camera infos ... [19/05 20:52:09]
theta[0] 0.0 [19/05 20:52:09]
Train Cameras loaded 12 [19/05 20:52:33]
Test Cameras loaded 12 [19/05 20:52:33]
Render Cameras loaded 120 [19/05 20:52:37]
Number of points at initialisation :  1038289 [19/05 20:52:37]
Training progress:  14%|██████                                    | 1000/7000 [02:03<11:54,  8.39it/s, Loss=0.0707449]
[ITER 1000] Evaluating test: L1 0.06763647186259428 PSNR 17.882627328236897 [19/05 20:54:42]

[ITER 1000] Evaluating train: L1 0.04614747241139412 PSNR 19.72754669189453 [19/05 20:54:44]

[ITER 1000] Saving Gaussians [19/05 20:54:44]
Training progress: 100%|██████████████████████████████████████████| 7000/7000 [15:38<00:00,  7.46it/s, Loss=0.2689072]

[ITER 7000] Evaluating test: L1 0.25050079201658565 PSNR 8.50779887040456 [19/05 21:08:17]

[ITER 7000] Evaluating train: L1 0.2306472510099411 PSNR 8.335065460205078 [19/05 21:08:17]

[ITER 7000] Saving Gaussians [19/05 21:08:17]

Training complete. [19/05 21:08:32]

log of native Gaussian project:

(gaussian_splatting) aogao@test-X640-G40:~/code/dust_gs/gaussian-splatting$ python train.py -s /home/aogao/code/dust_gs/gaussian-splatting/data/Horse_24 -m /home/aogao/code/dust_gs/gaussian-splatting/output/dust_gs/Horse_24_2 --iteratio
n=7000 --eval
Optimizing /home/aogao/code/dust_gs/gaussian-splatting/output/dust_gs/Horse_24_2
Output folder: /home/aogao/code/dust_gs/gaussian-splatting/output/dust_gs/Horse_24_2 [19/05 20:00:53]
Reading camera 24/24 [19/05 20:00:53]
Generating ellipse path from 24 camera infos ... [19/05 20:00:53]
theta[0] 0.0 [19/05 20:00:53]
Loading Training Cameras [19/05 20:01:13]
Loading Test Cameras [19/05 20:01:23]
Loading Render Cameras [19/05 20:01:23]
Number of points at initialisation :  1038289 [19/05 20:01:27]
Training progress: 100%|██████████████████████████████████████████| 7000/7000 [09:02<00:00, 12.91it/s, Loss=0.0366106]

[ITER 7000] Evaluating test: L1 0.04568970017135143 PSNR 20.60738754272461 [19/05 20:10:31]

[ITER 7000] Evaluating train: L1 0.02718411646783352 PSNR 23.515397262573245 [19/05 20:10:32]

[ITER 7000] Saving Gaussians [19/05 20:10:32]

Training complete. [19/05 20:10:42]
Yuhuoo commented 6 months ago

The Camera class I used is:

class Camera(nn.Module):
    def __init__(self, colmap_id, R, Q, T, FoVx, FoVy, image, gt_alpha_mask,
                 image_name, uid,
                 trans=np.array([0.0, 0.0, 0.0]), scale=1.0, data_device = "cuda"
                 ):
        super(Camera, self).__init__()

        self.uid = uid
        self.colmap_id = colmap_id
        self.init_Q = torch.tensor(Q, dtype=torch.float32, device="cuda")
        self.Q = nn.Parameter(self.init_Q.requires_grad_(True))
        self.T = nn.Parameter(torch.tensor(T, dtype=torch.float32, device="cuda").requires_grad_(True))
        # self.R = R
        # self.T = T
        self.FoVx = FoVx
        self.FoVy = FoVy
        self.image_name = image_name

        try:
            self.data_device = torch.device(data_device)
        except Exception as e:
            print(e)
            print(f"[Warning] Custom device {data_device} failed, fallback to default cuda device" )
            self.data_device = torch.device("cuda")

        self.original_image = image.clamp(0.0, 1.0).to(self.data_device)
        self.image_width = self.original_image.shape[2]
        self.image_height = self.original_image.shape[1]

        if gt_alpha_mask is not None:
            self.original_image *= gt_alpha_mask.to(self.data_device)
        else:
            self.original_image *= torch.ones((1, self.image_height, self.image_width), device=self.data_device)

        self.zfar = 100.0
        self.znear = 0.01

        self.trans = trans
        self.scale = scale

        self.optimizer = torch.optim.Adam(self.parameters(), lr=0.0001)

    def get_extrinsic_camcenter(self):
        R = roma.unitquat_to_rotmat(self.Q)
        Rt = torch.zeros((4, 4), dtype=torch.float32).to(self.Q.device)
        Rt[:3, :3] = R
        Rt[:3, 3] = self.T
        Rt[3, 3] = 1.0

        extrinsic_matrix = Rt[:3, :]
        world_view_transform = Rt.transpose(0, 1)
        camera_center = world_view_transform.inverse()[3, :3]
        return extrinsic_matrix, camera_center

and Train.py just made a few modifications to the original code:

    ...
        render_pkg = ms_render(viewpoint_cam, gaussians, pipe, bg)
        image, viewspace_point_tensor, visibility_filter, radii = render_pkg["render"], render_pkg["viewspace_points"], render_pkg["visibility_filter"], render_pkg["radii"]

        ...

            # Optimizer step
            if iteration < opt.iterations:
                gaussians.optimizer.step()
                gaussians.optimizer.zero_grad(set_to_none = True)
                viewpoint_cam.optimizer.step()
                viewpoint_cam.optimizer.zero_grad(set_to_none = True)

            ...
yGaoJiany commented 6 months ago

Hey, mind passing me the data? Or maybe just let me know where I can grab it? That way, I can figure out what's causing the issue.

Yuhuoo commented 6 months ago

Hey, mind passing me the data? Or maybe just let me know where I can grab it? That way, I can figure out what's causing the issue.

Of course. Could you leave your email here?

yGaoJiany commented 6 months ago

ygaojiany (at) gmail (dot) com

yGaoJiany commented 6 months ago

Well, I tried to find out where the problem is and turned off the camera optimization in MSplat.

python train.py -s ./datasets/Horse_24 --iterations 7000 --eval
Here is the report: image And PSNR: Iteration 1k 3k 5k 7k
Original 18.27 19.59 11.36 8.46
MSplat 14.86 15.75 10.69 10.20

I have a feeling that this issue isn't unique to MSPlat but might be a general problem caused by sparse views.

yGaoJiany commented 6 months ago

3DGS log:

Optimizing 
Output folder: ./output/43250a93-b [20/05 16:21:27]
Reading camera 24/24 [20/05 16:21:27]
Loading Training Cameras [20/05 16:21:41]
Loading Test Cameras [20/05 16:21:41]
Number of points at initialisation :  1038289 [20/05 16:21:41]
Training progress:  14%|█████████████████████████████▌                                                                                                                                                                                 | 1000/7000 [00:19<01:47, 56.02it/s, Loss=0.0793998]
[ITER 1000] Evaluating test: L1 0.061615398774544396 PSNR 18.273886998494465 [20/05 16:22:01]

[ITER 1000] Evaluating train: L1 0.05177646279335022 PSNR 19.14434700012207 [20/05 16:22:02]
Training progress:  43%|████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                      | 3000/7000 [01:00<01:23, 47.83it/s, Loss=0.0516149]
[ITER 3000] Evaluating test: L1 0.04752563312649727 PSNR 19.586928685506184 [20/05 16:22:42]

[ITER 3000] Evaluating train: L1 0.031445787847042085 PSNR 21.69693069458008 [20/05 16:22:43]
Training progress:  71%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                           | 5000/7000 [01:35<00:35, 56.71it/s, Loss=0.1658260]
[ITER 5000] Evaluating test: L1 0.14922837416330972 PSNR 11.36409060160319 [20/05 16:23:17]

[ITER 5000] Evaluating train: L1 0.12421074509620667 PSNR 12.211932182312012 [20/05 16:23:18]
Training progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7000/7000 [02:13<00:00, 52.51it/s, Loss=0.2631228]

[ITER 7000] Evaluating test: L1 0.22655571003754932 PSNR 8.464351336161295 [20/05 16:23:55]

[ITER 7000] Evaluating train: L1 0.20925374627113344 PSNR 8.784189987182618 [20/05 16:23:56]

[ITER 7000] Saving Gaussians [20/05 16:23:56]

Training complete. [20/05 16:24:04]
yGaoJiany commented 6 months ago

MSplat log:

Optimizing 
Output folder: ./output/f6dc80e9-7 [20/05 16:22:41]
Reading camera 24/24 [20/05 16:22:41]
Loading Training Cameras [20/05 16:22:54]
Loading Test Cameras [20/05 16:22:54]
Number of points at initialisation :  1038289 [20/05 16:22:54]
Training progress:  14%|█████████████████████████████▌                                                                                                                                                                                 | 1000/7000 [00:42<04:04, 24.50it/s, Loss=0.1680606]
[ITER 1000] Evaluating test: L1 0.11551610132058461 PSNR 14.860183080037434 [20/05 16:23:38]

[ITER 1000] Evaluating train: L1 0.09769992232322694 PSNR 15.919201087951661 [20/05 16:23:39]
Training progress:  43%|████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                      | 3000/7000 [02:14<03:25, 19.46it/s, Loss=0.0880171]
[ITER 3000] Evaluating test: L1 0.09689778337876001 PSNR 15.75195566813151 [20/05 16:25:09]

[ITER 3000] Evaluating train: L1 0.04772187173366547 PSNR 20.029360580444337 [20/05 16:25:09]
Training progress:  71%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                           | 5000/7000 [03:37<01:29, 22.44it/s, Loss=0.1615551]
[ITER 5000] Evaluating test: L1 0.1805816094080607 PSNR 10.693429629007975 [20/05 16:26:33]

[ITER 5000] Evaluating train: L1 0.12614742666482925 PSNR 12.147203826904297 [20/05 16:26:33]
Training progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7000/7000 [05:12<00:00, 22.37it/s, Loss=0.2180582]

[ITER 7000] Evaluating test: L1 0.19152536988258362 PSNR 10.196534792582193 [20/05 16:28:08]

[ITER 7000] Evaluating train: L1 0.1519288718700409 PSNR 10.88718204498291 [20/05 16:28:08]

[ITER 7000] Saving Gaussians [20/05 16:28:08]

Training complete. [20/05 16:28:20]
Yuhuoo commented 6 months ago

Okay, there may be some bugs with my workspace. Thank you for your reply. This issue can be closed now.