Core Dumped after a few iterations #130

alvaro-budria closed 1 year ago

alvaro-budria commented 1 year ago

Thanks for the good work. I was trying out the CUDA accelerated rendering provided in this repo. I am trying to use the alpha-based rendering.

So far it seems that training proceeds for a few epochs (~1-300, usually not more), and then an error pops up.

I tried executing the script both with CUDA_LAUNCH_BLOCKING=1 and without it. When this flag is set, the error is

Aborted (core dumped)

without any further clarifications, and it always happens when running the following line:


Without this flag, the error message is

Traceback (most recent call last):
  File "/home/abudria/VQAD_I-ngp-enhanced-NeuS/", line 527, in <module>
  File "/home/abudria/VQAD_I-ngp-enhanced-NeuS/", line 193, in train
  File "/home/abudria/miniconda3/lib/python3.9/site-packages/torch/", line 487, in backward
  File "/home/abudria/miniconda3/lib/python3.9/site-packages/torch/autograd/", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

and as you can see from the error trace, it is the same line that leads to the crash


Reducing the number of samples per ray and the batch size to extremely small values does drive away the error. However, I have strong doubts that this is an OOM problem because:

  1. Before using nerfacc's CUDA rendering, I was using my own pure PyTorch pipeline with a large batch size and a large number of samples per ray.
  2. I am monitoring the GPU memory consumption with watch -d -n 0.01 nvidia-smi. At no point does the memory shoot up to the 23028MiB limit of the server's NVIDIA A10.

The rendering code I have is

        sdf_grad_samples = []

        def alpha_fn(t_starts, t_ends, ray_indices):
            ray_indices = ray_indices.long()
            t_origins = rays_o[ray_indices]
            t_dirs = rays_d[ray_indices]
            midpoints = (t_starts + t_ends) / 2.
            positions = t_origins + t_dirs * midpoints
            sdf = self.geometry.sdf(positions,)
            sdf_grad = self.geometry.gradient(positions,).squeeze()
            normal = F.normalize(sdf_grad, p=2, dim=-1)
            dists = t_ends - t_starts
            alpha = self.get_alpha(sdf, normal, t_dirs, dists)
            return alpha[...,None]

        def rgb_alpha_fn(t_starts, t_ends, ray_indices):
            ray_indices = ray_indices.long()
            t_origins = rays_o[ray_indices]
            t_dirs = rays_d[ray_indices]
            midpoints = (t_starts + t_ends) / 2.
            positions = t_origins + t_dirs * midpoints
            geometry = self.geometry(positions,)
            sdf, feature = geometry[:, :1], geometry[:, 1:]
            sdf_grad = self.geometry.gradient(positions,)
            dists = t_ends - t_starts
            normal = F.normalize(sdf_grad, p=2, dim=-1)
            alpha = self.get_alpha(sdf, normal, t_dirs, dists)
            rgb = self.texture(positions, normal, t_dirs, feature)
            return rgb, alpha[..., None]

        with torch.no_grad():
            packed_info, t_starts, t_ends = ray_marching(
                rays_o, rays_d,
                grid=None,  # self.occupancy_grid if self.grid_prune else None,
                near_plane=near.squeeze(), far_plane=far.squeeze(),

        rgb, opacity, depth = rendering(
            # render_bkgd=self.background_color,

        sdf_grad_samples =, dim=0)
        opacity, depth = opacity.squeeze(-1), depth.squeeze(-1)
        num_samples = torch.as_tensor([len(t_starts)], dtype=torch.int32, device=rays_o.device)

Any ideas?

alvaro-budria commented 1 year ago

I found the problem and a solution for it.

Turns out that at some point I was calling the following snippet in my geometry network:

def gradient(self, x,):
        with torch.set_grad_enabled(True):
            y = self.sdf(x)
            gradients = torch.autograd.grad(
        return gradients.reshape(-1, 3)

The line


was causing the graph on x to be empty, and thus an illegal memory access happened, or at least that's the explanation I could come up with. At any rate, changing the line above for

x = x + 0 * self.__hidden__(x)

solved the problem! I also needed to add this

self.__hidden__ = torch.nn.Linear(3, 1, bias=False)

to the __init__ of my class.

CCamouflage-Hvv commented 1 year ago

Thanks for you sharing! Can I ask that how to use "with torch.no_grad():" in "packed_info, t_starts, t_ends = ray_marching( rays_o, rays_d, scene_aabb=self.scene_aabb, grid=None, # self.occupancy_grid if self.grid_prune else None, alpha_fn=alpha_fn, near_plane=near.squeeze(), far_plane=far.squeeze(), render_step_size=self.render_step_size, stratified=self.randomized, cone_angle=0.0, alpha_thre=0.0, )" To be specific, the gradient calculation in alpha_fn has to use grad_fn of pytorch. It's that possible to use "with torch.no_grad():" here? Thanks again!

alvaro-budria commented 1 year ago

Yes, you can wrap the ray_marching in a with torch.no_grad(), however inside the forward of the geometry module, you need to add a line enabling grad for the forward pass there: with torch.set_grad_enabled( or (with_grad and self.grad_type == 'analytic')):.

You can check the repo I based my code on, to see the details: