nerfstudio-project / nerfacc

A General NeRF Acceleration Toolbox in PyTorch.
https://www.nerfacc.com/
Other
1.37k stars 113 forks source link

What's the suggested cone_angle when training with examples? #172

Closed 97littleleaf11 closed 1 year ago

97littleleaf11 commented 1 year ago

It seems that cone_angle would effect the memory usage. For example, I got memory invalid access error when training train_ngp_nerf with default core_angle 0. I also got OOM when training mlp_nerf without setting cone_angle. It would be better if the docs can clarify this.

liruilong940607 commented 1 year ago

If invalid access error appear it means there might be bugs somewhere. May I know your exact command that trigger it?

The OOM issue is probably simply your GPU memory is limited. You can try reduce the batch size

97littleleaf11 commented 1 year ago

@liruilong940607 Thanks for your reply!

Here is the log:

python3 examples/train_ngp_nerf.py --data_root ~/nerf/nerf_data/ --scene garden --unbounded --train_split train
Warning: image_path not found for reconstruction
loading images
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185/185 [00:04<00:00, 40.57it/s]
Warning: image_path not found for reconstruction
loading images
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185/185 [00:04<00:00, 40.56it/s]
Using unbounded rendering
Traceback (most recent call last):
  File "/home/yjc/nerf/nerfacc/examples/train_ngp_nerf.py", line 227, in <module>
    rgb, acc, depth, n_rendering_samples = render_image(
  File "/home/yjc/nerf/nerfacc/examples/utils.py", line 88, in render_image
    ray_indices, t_starts, t_ends = ray_marching(
  File "/opt/miniconda3/envs/nerfstudio/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/miniconda3/envs/nerfstudio/lib/python3.9/site-packages/nerfacc/ray_marching.py", line 196, in ray_marching
    sigmas = sigma_fn(t_starts, t_ends, ray_indices)
  File "/home/yjc/nerf/nerfacc/examples/utils.py", line 63, in sigma_fn
    return radiance_field.query_density(positions)
  File "/home/yjc/nerf/nerfacc/examples/radiance_fields/ngp.py", line 155, in query_density
    self.mlp_base(x.view(-1, self.num_dim))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
97littleleaf11 commented 1 year ago

With CUDA_LAUNCH_BLOCKING=1 I simply got abort (core dumped)

liruilong940607 commented 1 year ago

Thanks for reporting. I will find sometime next week to look into this. May I know the version of nerfacc you are using?

Btw, what's the reason for abandoning the argument --auto_aabb ?

97littleleaf11 commented 1 year ago

I just checkouted to the master branch and I can reproduce it. The default aabb works well in the garden scene.

liruilong940607 commented 1 year ago

With the latest nerfacc>=0.5.0 that uses multi-res grid (or proposal network) for accelerating unbounded scenes, this issue should now be gone.