nyu-systems / Grendel-GS

Ongoing research training gaussian splatting at scale by distributed system
Apache License 2.0
389 stars 20 forks source link

Error:when backend == 'gsplat' in distributed training mode #17

Closed whiteinblue closed 3 months ago

whiteinblue commented 3 months ago

[07/08 12:06:20] Training progress: 0%| | 78/50000 [00:02<15:45, 52.83it/s, Loss=0.2319733]torch.Size([1, 720, 1280, 3])torch.Size([1, 720, 1280, 3]) [07/08 12:06:20] [07/08 12:06:20] torch.Size([1, 720, 1280, 3]) [07/08 12:06:20] torch.Size([1, 720, 1280, 3]) [07/08 12:06:20] torch.Size([]) [07/08 12:06:20] torch.Size([1, 720, 1280, 3]) [07/08 12:06:20] Traceback (most recent call last): File "train_dustinit.py", line 86, in train_internal.training(lp.extract(args), op.extract(args), pp.extract(args), args, log_file, dustor=dustor) File "/mnt/3d_shuyao/dl/grendel_gs/train_internal.py", line 134, in training batched_image, batched_compute_locally = gsplat_render_final(batched_screenspace_pkg, batched_strategies) File "/mnt/3d_shuyao/dl/grendel_gs/gaussian_renderer/init.py", line 1184, in gsplat_render_final rendered_image = rendered_image.squeeze(0).permute(2, 0, 1).contiguous() RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 0 is not equal to len(dims) = 3 Training progress: 0%| | 82/50000 [00:02<23:04, 36.07it/s, Loss=0.2319733]

TarzanZhao commented 3 months ago

Hi, I just updated the code. It may fix this issue. Please pull it and have a try. Thanks for pointing out!