Closed jiahaolu97 closed 4 months ago
🤔 It's a little strange. I've also tried training NeRF Lego with msplat, but haven't found this problem.
Emmm that's really weird. I carefully transfer the code from original 3DGS repo so the hyperparameter settings are the same ...
May I ask if there is any change in hyper-parameters in your case?
Since from my observation, using default hyper-parameters here will result fewer Gaussians (original: ~600 K gaussians, here: ~5K Gaussians ) - seems only a few of Gaussians reached the densification threshold and got densified.
Well, I will check this 🏗️. Please keep this issue open!
Thank you! Glad to discuss more on this and provide potential assistance.
Well, I did a quick replacement based on the original repo of gaussian splatting (https://github.com/graphdeco-inria/gaussian-splatting). And it seems that msplat works just fine. Here is what I get at 7000 iteration.
I replaced the function named as render in _gaussianrenderer/init.py with this one:
import msplat as ms
def render(viewpoint_camera, pc : GaussianModel, pipe, bg_color : torch.Tensor, scaling_modifier = 1.0, override_color = None):
"""
Render the scene with msplat.
Background tensor (bg_color) must be on GPU!
"""
# print("Hello, msplat.")
# tranform 3dgs to msplat
position = pc.get_xyz
fovx = viewpoint_camera.FoVx
fovy = viewpoint_camera.FoVy
width = int(viewpoint_camera.image_width)
height = int(viewpoint_camera.image_height)
fx = fov2focal(fovx, width)
fy = fov2focal(fovy, height)
cx = float(width) / 2
cy = float(height) / 2
intrinsic_params = torch.tensor([fx, fy, cx, cy]).cuda().float()
extrinsic_matrix = viewpoint_camera.world_view_transform.transpose(0, 1)
extrinsic_matrix = extrinsic_matrix[:3, :]
camera_center = viewpoint_camera.camera_center
opacity = pc.get_opacity
shs = pc.get_features.permute(0, 2, 1)
scaling = pc.get_scaling
rotation = pc.get_rotation
# project points and perform culling
(uv, depth) = ms.project_point(
position,
intrinsic_params,
extrinsic_matrix,
width, height)
visible = depth != 0
# compute sh if not None
direction = (position -
camera_center.repeat(position.shape[0], 1))
direction = direction / direction.norm(dim=1, keepdim=True)
sh2rgb = ms.compute_sh(shs, direction, visible)
rgb = torch.clamp_min(sh2rgb + 0.5, 0.0)
# compute cov3d
cov3d = ms.compute_cov3d(scaling, rotation, visible)
# ewa project
(conic, radius, tiles_touched) = ms.ewa_project(
position,
cov3d,
intrinsic_params,
extrinsic_matrix,
uv,
width,
height,
visible
)
# sort
(gaussian_ids_sorted, tile_range) = ms.sort_gaussian(
uv, depth, width, height, radius, tiles_touched
)
# render
ndc = torch.zeros_like(uv, requires_grad=True)
try:
ndc.retain_grad()
except:
raise ValueError("ndc does not have grad")
# alpha blending
render = ms.alpha_blending(
uv, conic, opacity, rgb,
gaussian_ids_sorted, tile_range, bg_color[0].item(), width, height, ndc
)
return {"render": render,
"viewspace_points": ndc,
"visibility_filter" : radius > 0,
"radii": radius}
Here's what I got with msplat
[ITER 30000] Evaluating test: L1 0.004826885849470273 PSNR 36.27218983650208 [17/05 16:02:35]
[ITER 30000] Evaluating train: L1 0.0029831654857844117 PSNR 40.72242660522461 [17/05 16:02:35]
The number of points is 345,685.
Here is the report for original 3DGS.
[ITER 30000] Evaluating test: L1 0.004817731926450506 PSNR 36.26937942504883 [17/05 16:18:35]
[ITER 30000] Evaluating train: L1 0.0029933738056570295 PSNR 40.61189193725586 [17/05 16:18:35]
The number of point is 343,412.
It seems that msplat is a little bit slower. For Lego, msplat costs 10min12s for 30000 iteration while the original kernel costs 7min13s.
Thank you @yGaoJiany for providing your results!
Let me check on my codes. I will get back to this issue after some time to report where my bugs are and hope it can help more users.
@Yuhuoo I don't think we have the same problem, since I tested for the same data between msplat
and the Inria repo ... Using 100 or 300 camera views both produce needle like Gaussians in my implementation with msplat
, however in the Inria repo I didn't meet the same problem (either 100 images or 300 images).
Given the information from @yGaoJiany, I assume there's maybe some bugs in my code, not with their cuda kernel. I need to have some time to check on this though.
@Yuhuoo Maybe you can start a new issue about overfitting in a setting of fewer images and we can discuss this topic there.
I transfered the whole optimization and dataset/camera reading pipeline into this repo, and I found that with the same hyper-parameter setting as original 3DGS repo default, it will easily produce needle-like gaussians:
Could you check if there is any errors in kernel computation (likely the backward computation of scale)? I can share my transferral of other components (optimization pipeline, 3D dataset readers) here if needed (likely some time after NeurIPS ddl).