nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.34k stars 1.26k forks source link

instant-ngp-bounded method fail before Setting up evaluation dataset... and freeze #2508

Open ou524u opened 11 months ago

ou524u commented 11 months ago

my environment

wsl2 + RTX3050 laptop nerfacc 0.5.2 nerfstudio 0.3.4 torch 2.0.1+cu118 torchvision 0.15.2+cu118

Describe the bug

when running the command

ns-train instant-ngp-bounded --data data/poster

nerfstudio stuck at

Setting up training dataset... Caching all 204 images.

Nerfstudio gets stuck, and experiences a significant increase in memory usage, eventually leading to a Linux OOM error. The process get killed. (The RAM assigned for wsl is more than 8GB and is complete enough for default nerfacto method.) Besides, default port of ns-viewer would appear Renderer disconnected.

It's worth mentioning that the first try for instant-ngp-training succeeded. Described in To Reproduce.

To Reproduce Steps to reproduce the behavior:

  1. first train with ns-train instant-ngp-bounded succeeded.

  2. Then I ran ns-viewer --load-config, and the viewer failed to load. The viewer got stuck at

    Setting up training dataset... Caching all 204 images.

Maybe that have something todo with https://github.com/nerfstudio-project/nerfstudio/issues/1835#issuecomment-1527983032 , I'm not sure. Anyway, after waiting for about an hour I ctrl+c ended that.

  1. Then each time I try ns-train instant-ngp-bounded I meet this error.

  2. Also, when I try ns-viewer for nerfacto configs, the default port appears Renderer disconnected. But after changing the port the ns-viewer could function well.

  3. Such error appears on multiple datasets.

Expected behavior after the output

Setting up training dataset... Caching all 204 images.

it should be

Setting up evaluation dataset... Caching all 22 images.

Screenshots

微信图片_20231011173649
tancik commented 11 months ago

nerfacc is likely failing silently. Can you try reinstalling nerfacc. https://github.com/KAIR-BAIR/nerfacc

IamMohitM commented 11 months ago

@tancik I updated to the new version of nerfstudio. And I'm working on implementing a new Model Class where I use the VolumetricSampler and pass the nerfacc occupancy grid estimator.

When I run the following snippet, my code freezes.

self.occupancy_grid = nerfacc.OccGridEstimator(
            roi_aabb=self.scene_aabb,
            resolution=self.config.grid_resolution,
            levels=self.config.grid_levels,
        )

self.sampler = VolumetricSampler(
            occupancy_grid=self.occupancy_grid, density_fn=self.field.density_fn
        )
....
with torch.no_grad():
              ray_samples, ray_indices = self.sampler(
                  ray_bundle=ray_bundle,
                  near_plane=self.config.min_near,
                  far_plane=self.config.min_far,
                  render_step_size=self.config.render_step_size,
                  alpha_thre=self.config.alpha_thre,
                  cone_angle=self.config.cone_angle,
              )

I have tried uninstalling and installing nerfacc but that hasn't helped. If you have any advice or guidance, that would be great!

IamMohitM commented 11 months ago

After some digging, I have found that the code gets stuck at function ray_aabb_intersect of nerfacc (nerfacc/grid.py)

t_mins, t_maxs, hits = _C.ray_aabb_intersect(
        rays_o.contiguous(),
        rays_d.contiguous(),
        aabbs.contiguous(),
        near_plane,
        far_plane,
        miss_value,
    )