nerfstudio-project / nerfacc

A General NeRF Acceleration Toolbox in PyTorch.
https://www.nerfacc.com/
Other
1.37k stars 113 forks source link

cutlass_matmul.h:332 status failed with error Error Internal #159

Closed sunrainyg closed 1 year ago

sunrainyg commented 1 year ago

when I run python examples/train_ngp_nerf.py --train_split train --scene lego, At first, every thing is fine. After 321s:

Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
elapsed_time=1.58s | step=0 | loss=0.07349 | alive_ray_mask=256 | n_rendering_samples=67422 | num_rays=256 |
elapsed_time=159.01s | step=10000 | loss=0.00057 | alive_ray_mask=16442 | n_rendering_samples=262921 | num_rays=49370 |
elapsed_time=321.78s | step=20000 | loss=0.00039 | alive_ray_mask=16351 | n_rendering_samples=263510 | num_rays=49783 |
  0%|                                                                   | 0/200 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "xxx/nerfacc/examples/train_ngp_nerf.py", line 276, in <module>
    rgb, acc, depth, _ = render_image(
  File "xxx/nerfacc/examples/utils.py", line 86, in render_image
    ray_indices, t_starts, t_ends = ray_marching(
  File "/root/anaconda3/envs/nerfacc/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/nerfacc/lib/python3.9/site-packages/nerfacc/ray_marching.py", line 196, in ray_marching
    sigmas = sigma_fn(t_starts, t_ends, ray_indices)
  File "xxx/nerfacc/examples/utils.py", line 62, in sigma_fn
    return radiance_field.query_density(positions)
  File "xxx/nerfacc/examples/radiance_fields/ngp.py", line 155, in query_density
    self.mlp_base(x.view(-1, self.num_dim))
  File "/root/anaconda3/envs/nerfacc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/nerfacc/lib/python3.9/site-packages/tinycudann-1.7-py3.9-linux-x86_64.egg/tinycudann/modules.py", line 177, in forward
    output = _module_function.apply(
  File "/root/anaconda3/envs/nerfacc/lib/python3.9/site-packages/tinycudann-1.7-py3.9-linux-x86_64.egg/tinycudann/modules.py", line 89, in forward
    native_ctx, output = native_tcnn_module.fwd(input, params)
RuntimeError: xxx/tiny-cuda-nn/include/tiny-cuda-nn/cutlass_matmul.h:332 status failed with error Error Internal

ps: It was fine when I run python examples/train_mlp_nerf.py --train_split train --scene lego

Could you please help to find what's the problem? Thank you!

liruilong940607 commented 1 year ago

Hi maybe this is the cause? https://github.com/NVlabs/tiny-cuda-nn/issues/236

sunrainyg commented 1 year ago

Thanks for your suggestion. I change the batch_size from 262144 to 1024, but still the same error

InduCherukuri commented 1 year ago

I'm also facing the same issue.

Abc11c commented 1 year ago

Hi, Tried dropping the batch_size as well, facing the same issue, anyone found a solution? Why is that this comes up only during evaluation stage ? Training seems to be fine

Thanks!

pwais commented 1 year ago

I think Tom's suggestion is a good thing to try here ( https://github.com/NVlabs/tiny-cuda-nn/issues/236#issuecomment-1376996853 ):

slice your batch into chunks of, say, 1m elements, and compute parameter gradients for each of these chunks separately. Then, simply average those gradients. The resulting values will be the same as if you had computed them from a single large batch. (Ignoring fp32 order-of-addition quirks, which shouldn't be significant here.)

Maybe there's a way to effectively use one GPU in DDP mode?

imkanghan commented 1 year ago

Hi,

I found this error raised when the input size is 0 to tiny-cuda-nn. Adding a condition at the beginning of the function of radiance_field.query_density(positions) solves the problem in my case:

def query_density(self, x, return_feat: bool = False):
    if x.shape[0] == 0:
        if return_feat:
            return x.new_zeros(0, 1), x.new_zeros(0, self.geo_feat_dim)
        else:
            return x.new_zeros(0, 1)

    # other code

Cheers