wbhu / Tri-MipRF

Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields, ICCV'23 (Oral, Best Paper Finalist)
https://wbhu.github.io/projects/Tri-MipRF
450 stars 13 forks source link

CUDA Error #22

Closed Wallong closed 9 months ago

Wallong commented 11 months ago

Hi, great work! When I run nerf_synthetic data I get a CUDA error, is there some configuration that I overlooked that is causing the error?

2023-11-02 09:44:24.958 | INFO     | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:23000 iter_time:0.01472163200378418 ETA:0:00:29 num_alive_ray:13716 rendering_samples_actual:269133 num_rays:39829 PSNR:37.34233474731445 total_loss:0.0007085531251505017 
2023-11-02 09:44:42.329 | INFO     | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:24000 iter_time:0.012811899185180664 ETA:0:00:12 num_alive_ray:13679 rendering_samples_actual:261635 num_rays:40487 PSNR:37.36977005004883 total_loss:0.0007570512825623155 
2023-11-02 09:44:59.344 | INFO     | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:25000 iter_time:0.01546168327331543 ETA:0:00:00 num_alive_ray:13266 rendering_samples_actual:263475 num_rays:38886 PSNR:37.54179382324219 total_loss:0.0006864252500236034 
Traceback (most recent call last):
  File "main.py", line 96, in <module>
    main()
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "main.py", line 56, in main
    trainer.fit()
  File "/home/wll/workspace/nerf/Tri-MipRF/trainer/trainer.py", line 140, in fit
    metrics, final_rb, target = self.eval_img(
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wll/workspace/nerf/Tri-MipRF/trainer/trainer.py", line 168, in eval_img
    rb = self.model(
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 118, in forward
    return self.rendering(
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 140, in rendering
    rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices.long())
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 115, in rgb_sigma_fn
    rgb = self.field.query_rgb(dir=t_dirs, embedding=feature)['rgb']
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/field/trimipRF.py", line 97, in query_rgb
    self.mlp_head(h)
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/tinycudann-1.7-py3.8-linux-x86_64.egg/tinycudann/modules.py", line 189, in forward
    self.params.to(_torch_precision(self.native_tcnn_module.param_precision())).contiguous(),
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  In call to configurable 'main' (<function main at 0x7fa137334700>)
hrz2000 commented 10 months ago

the same bug,waiting for a solution

zhuhu00 commented 10 months ago

same error...

VictorStarkSnow commented 10 months ago

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

VictorStarkSnow commented 10 months ago

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

wbhu commented 9 months ago

closed as solved

Terry10086 commented 8 months ago

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?

# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512

# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024

2024-01-12 14:33:35.438 | INFO     | trainer.trainer:fit:106 - ==> Start training ...

NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
    main()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
    trainer.fit()
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
    self.model.before_iter(step)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
    self.ray_sampler.every_n_step(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
    self._update(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
    x = contract_inv(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
    ctype = type.to_cpp_version()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
    return _C.ContractionTypeGetter(self.value)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
  In call to configurable 'main' (<function main at 0x7fd229f57b80>)
VictorStarkSnow commented 8 months ago

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?

# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512

# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024

2024-01-12 14:33:35.438 | INFO     | trainer.trainer:fit:106 - ==> Start training ...

NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
    main()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
    trainer.fit()
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
    self.model.before_iter(step)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
    self.ray_sampler.every_n_step(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
    self._update(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
    x = contract_inv(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
    ctype = type.to_cpp_version()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
    return _C.ContractionTypeGetter(self.value)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
  In call to configurable 'main' (<function main at 0x7fd229f57b80>)

Yes, the above version works in my 3090. Your problem seemingly caused by NerfAcc, perhaps you didn't install Cuda toolkit or didn't add it's path to your system. You can try "nvcc --version" test if it had been added.

Terry10086 commented 8 months ago

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?

# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512

# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024

2024-01-12 14:33:35.438 | INFO     | trainer.trainer:fit:106 - ==> Start training ...

NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
    main()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
    trainer.fit()
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
    self.model.before_iter(step)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
    self.ray_sampler.every_n_step(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
    self._update(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
    x = contract_inv(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
    ctype = type.to_cpp_version()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
    return _C.ContractionTypeGetter(self.value)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
  In call to configurable 'main' (<function main at 0x7fd229f57b80>)

Yes, the above version works in my 3090. Your problem seemingly caused by NerfAcc, perhaps you didn't install Cuda toolkit or didn't add it's path to your system. You can try "nvcc --version" test if it had been added.

Thank you so much for your kind reply! It is because the path of nvcc is not found. My problem solved!