nv-tlabs / NKSR

[CVPR 2023 Highlight] Neural Kernel Surface Reconstruction
https://research.nvidia.com/labs/toronto-ai/NKSR
Other
735 stars 43 forks source link

RuntimeError: CUDA error: an illegal memory access was encountered #40

Open HerculesJL opened 1 year ago

HerculesJL commented 1 year ago

Hello! Thank you for your excellent work. I encountered the following error when I change "cuda:0" to “cuda:1” or other cuda, can you give me some advice ? I only modified 'cuda: x' in the sample program

Traceback (most recent call last):
  File "/data/songzhenbo/NKSR-public/examples/recons_simple.py", line 27, in <module>
    field = reconstructor.reconstruct(input_xyz, input_normal, detail_level=1.0)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/nksr/__init__.py", line 256, in reconstruct
    feat, svh, udf_svh = self.network.unet(
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/nksr/nn/unet.py", line 238, in forward
    feat, encoder_svh, feat_depth = module(feat, encoder_svh, feat_depth)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 316, in forward
    feat, svh, depth = module(feat, svh, depth, **kwargs)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 316, in forward
    feat, svh, depth = module(feat, svh, depth, **kwargs)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 112, in forward
    nbmap, nbsizes, _ = self._compute_conv_args(in_grid, out_grid)
  File "/data/songzhenbo/.conda/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 75, in _compute_conv_args
    nbmap = torch.nonzero(kmap != -1).contiguous()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
heiwang1997 commented 1 year ago

Hey @HerculesJL thank you for reporting this! This is definitely a bug and I will look into this asap.

smittyjaggerman commented 2 months ago

Hey, @heiwang1997 has this problem been resolved / is there a working workaround for it? I am currently facing the same issue.

When using "cuda:1" in any NKSR related script I get following error:

terminate called after throwing an instance of 'thrust::system::system_error' what(): CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped)

Help would be really appreciated! :)