An error occurred while training the custom dataset

Hi @mikeqzy ,I want to run this project with custom dataset, but I got this error during training:

File "/home/WLY/3dgs-avatar-release/train.py", line 330, in main
    training(config)
  File "/home/WLY/3dgs-avatar-release/train.py", line 110, in training
    render_pkg = render(data, iteration, scene, pipe, background, compute_loss=True, return_opacity=use_mask)
File "/home/WLY/3dgs-avatar-release/gaussian_renderer/__init__.py", line 30, in render
    pc, loss_reg, colors_precomp = scene.convert_gaussians(data, iteration, compute_loss)
 File "/home/WLY/3dgs-avatar-release/scene/__init__.py", line 63, in convert_gaussians
    return self.converter(self.gaussians, viewpoint_camera, iteration, compute_loss)
File "/home/ubuntu/anaconda3/envs/3dgsa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/3dgsa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
File "/home/WLY/3dgs-avatar-release/models/gaussian_converter.py", line 51, in forward
    deformed_gaussians, loss_reg_deformer = self.deformer(gaussians, camera, iteration, compute_loss)
File "/home/ubuntu/anaconda3/envs/3dgsa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
 File "/home/ubuntu/anaconda3/envs/3dgsa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
File "/home/WLY/3dgs-avatar-release/models/deformer/deformer.py", line 15, in forward
    deformed_gaussians, loss_non_rigid = self.non_rigid(gaussians, iteration, camera, compute_loss)
File "/home/ubuntu/anaconda3/envs/3dgsa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/3dgsa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
File "/home/WLY/3dgs-avatar-release/models/deformer/non_rigid.py", line 268, in forward
    q1[0] = 1.  # [1,0,0,0] represents identity rotation
IndexError: index 0 is out of bounds for dimension 0 with size 0

I tried to find the source of the error, and I found that the error came from the anomaly of xyz variable:

none_grid.py line 239
xyz = gaussians.get_xyz

When the training goes smoothly, its output is:

xyz Parameter containing:

tensor([[-0.1932, -0.4575,  0.0899],
        [-0.4046,  0.2553,  0.0019],
        [ 0.2209, -0.5344, -0.0030],
        ...,
        [-0.0292,  0.0988, -0.1477],
        [ 0.1344,  0.1966, -0.0502],
        [ 0.1247,  0.2166, -0.0616]], device='cuda:0', requires_grad=True)

But then its shape changed, and its dimension was reduced to this:


xyz Parameter containing:

tensor([[ 0.4639,  0.1237, -0.0043]], device='cuda:0', requires_grad=True)

At the same time, the calculated Loss also turns to NAN. Finally, the value of xyz disappears, which causes the error of the code:


xyz Parameter containing:

tensor([], device='cuda:0', size=(0, 3), requires_grad=True)

This has been bothering me for a long time. I want to know what caused this abnormal situation. Could you give me some suggestions on how to solve these mistakes?

mikeqzy / 3dgs-avatar-release

An error occurred while training the custom dataset #12