Closed 97littleleaf11 closed 1 year ago
If invalid access error appear it means there might be bugs somewhere. May I know your exact command that trigger it?
The OOM issue is probably simply your GPU memory is limited. You can try reduce the batch size
@liruilong940607 Thanks for your reply!
Here is the log:
python3 examples/train_ngp_nerf.py --data_root ~/nerf/nerf_data/ --scene garden --unbounded --train_split train
Warning: image_path not found for reconstruction
loading images
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185/185 [00:04<00:00, 40.57it/s]
Warning: image_path not found for reconstruction
loading images
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185/185 [00:04<00:00, 40.56it/s]
Using unbounded rendering
Traceback (most recent call last):
File "/home/yjc/nerf/nerfacc/examples/train_ngp_nerf.py", line 227, in <module>
rgb, acc, depth, n_rendering_samples = render_image(
File "/home/yjc/nerf/nerfacc/examples/utils.py", line 88, in render_image
ray_indices, t_starts, t_ends = ray_marching(
File "/opt/miniconda3/envs/nerfstudio/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/miniconda3/envs/nerfstudio/lib/python3.9/site-packages/nerfacc/ray_marching.py", line 196, in ray_marching
sigmas = sigma_fn(t_starts, t_ends, ray_indices)
File "/home/yjc/nerf/nerfacc/examples/utils.py", line 63, in sigma_fn
return radiance_field.query_density(positions)
File "/home/yjc/nerf/nerfacc/examples/radiance_fields/ngp.py", line 155, in query_density
self.mlp_base(x.view(-1, self.num_dim))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
With CUDA_LAUNCH_BLOCKING=1
I simply got abort (core dumped)
Thanks for reporting. I will find sometime next week to look into this. May I know the version of nerfacc you are using?
Btw, what's the reason for abandoning the argument --auto_aabb
?
I just checkouted to the master branch and I can reproduce it. The default aabb works well in the garden scene.
With the latest nerfacc>=0.5.0
that uses multi-res grid (or proposal network) for accelerating unbounded scenes, this issue should now be gone.
It seems that
cone_angle
would effect the memory usage. For example, I got memory invalid access error when trainingtrain_ngp_nerf
with defaultcore_angle
0. I also got OOM when trainingmlp_nerf
without setting cone_angle. It would be better if the docs can clarify this.