Open wuzirui opened 3 days ago
Hi @wuzirui, one thing that came to mind, are you certain that all gaussians are visible by your current viewmat. If gaussians are not visible (which means they do not land withing the camera height/width after projection), they are not retained in mem and no gradients flow to them.
Hi @maturk! Thanks for the reply, but if this is the case, the grad value for those gaussians would be zero instead of nans? For my data, the training can proceed several hundred steps and then crash (all gaussians turn into nans in 1~2 steps), so I guess it is not likely because of visibility issues?
hmm yes that is strange, what kind of dataset are you trying to train? I am wondering if there are any problems with projection. @wuzirui, you can check the following info["radii"] =< 0
and see if the radii that are 0 are the same as the ones with nan grad.
If the scales is super small, it could be numerically instable. Maybe try adding a small eps to the scales?
scales = torch.exp(scales_crop) + 1e-4
hmm yes that is strange, what kind of dataset are you trying to train? I am wondering if there are any problems with projection. @wuzirui, you can check the following
info["radii"] =< 0
and see if the radii that are 0 are the same as the ones with nan grad.
Thanks! But as can be seen here, all the Gaussians have positive or zero radii, none of them are negative, so I guess this is not the case?
If the scales is super small, it could be numerically instable. Maybe try adding a small eps to the scales?
scales = torch.exp(scales_crop) + 1e-4
Hi @liruilong940607, thanks for the comment! The minimum scale input to the kernel is 0.0024 (shown below) so I guess this is also not the case? Also I pasted the range of other input tensors as well FYI, thanks a lot!
Hi @wuzirui, did you figure out your problem?
Hi @wuzirui, did you figure out your problem?
not yet, maybe later I need to look into the kernel
Hi! I've recently run into a numerical stability problem with the rasterization kernel as shown below: I call the rasterization kernel with the following command:
I've checked the inputs and all the tensors are guaranteed to be not nan, but when I tried to calculate the gradient of ![image](https://github.com/nerfstudio-project/gsplat/assets/7344146/fd78372a-c1bb-4102-a772-d752aa127932)
render_outputs
oralpha
over gaussian params (e.g.scales
), nan values appear:Could you please give me some hints on which part of the code may be numerically unstable so I can come up with some walkaround, or where I shall start to fix the problem? Thanks a lot!