How to debug .cu files - Githubissues

seanzhuh commented 4 months ago

Hi, thanks for your wonderful work and neat cuda implementation with clear file structures.

A starting point to learn 3d gaussian splatting would be to jump into every piece of code and see what they are actually doing. Unfortunately however, the pdb from python does not support debug in cuda/c++ files. I've searched the internet and found a seemingly viable solution.

The solution is that, open two terminals, one is shell1 and the other is shell2. In shell 2, enter python shell, then in shell 1, ps -aux | grep python and run cuda-gdb -p PID to attach to the python process in shell 1. Then breakpoints can be set via break forward.cu:429. After setting breakpoints, run continue in shell 2. Then in shell 1, import os, and os.system('python train.py').

The above pipeline works under projects written in pure cuda. The whole project can be compiled using nvcc -Xcompiler -fPIC -std=c++11 -shared -arch=sm_60 -G -g -o t383.so t383.cu -DFIX for instance.

However, since 3dgs is a mix of python, c++, and cuda. I didn't know how to specify these arguments in setup.py or CMakeLists.txt files. As a result, when set breakpoints using break forward.cu:497, cuda-gdb complains that it can not find the source files, which I guess the problem is in linking stage?

Can someone help with this?

seanzhuh commented 4 months ago

I just managed to set breakpoints using the following setup.py file and run DEBUG=1 python setup.py build develop in case it is helpful to anyone.

from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
os.path.dirname(os.path.abspath(__file__))

setup(
    name="diff_gaussian_rasterization",
    packages=['diff_gaussian_rasterization'],
    ext_modules=[
        CUDAExtension(
            name="diff_gaussian_rasterization._C",
            sources=[
            "cuda_rasterizer/rasterizer_impl.cu",
            "cuda_rasterizer/forward.cu",
            "cuda_rasterizer/backward.cu",
            "rasterize_points.cu",
            "ext.cpp"],
            extra_compile_args={"nvcc": ["-O0", "-Xcompiler", "-fPIC", "-G", "-g", 
                                         "-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")],
                                'cxx': ["-g"]},
            extra_link_args=["-shared"]
            )
        ],
    cmdclass={
        'build_ext': BuildExtension
    }
)

Though set breakpoints is successful, however, when I run os.system('python train.py'), it seems it will create a subprocess that cuda-gdb is not attached to. cuda-gdb shows 'detaching after vfork from child processs 295097' whereas my python process is 294177.

seanzhuh commented 4 months ago

I just managed to set breakpoints using the following setup.py file and run DEBUG=1 python setup.py build develop in case it is helpful to anyone.
from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
os.path.dirname(os.path.abspath(__file__))

setup(
    name="diff_gaussian_rasterization",
    packages=['diff_gaussian_rasterization'],
    ext_modules=[
        CUDAExtension(
            name="diff_gaussian_rasterization._C",
            sources=[
            "cuda_rasterizer/rasterizer_impl.cu",
            "cuda_rasterizer/forward.cu",
            "cuda_rasterizer/backward.cu",
            "rasterize_points.cu",
            "ext.cpp"],
            extra_compile_args={"nvcc": ["-O0", "-Xcompiler", "-fPIC", "-G", "-g", 
                                         "-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")],
                                'cxx': ["-g"]},
            extra_link_args=["-shared"]
            )
        ],
    cmdclass={
        'build_ext': BuildExtension
    }
)
Though set breakpoints is successful, however, when I run os.system('python train.py'), it seems it will create a subprocess that cuda-gdb is not attached to. cuda-gdb shows 'detaching after vfork from child processs 295097' whereas my python process is 294177.

In addition to this, in shell 1, you need to explicitly import diff_gaussian_rasterization in order for cuda-gdb to load the generated .so files. Otherwise, cuda-gdb still can not find it!

seanzhuh commented 4 months ago

Do not use os.system() to execute the train.py script in shell 1. Instead you should use exec(open("./train.py").read()).

redherring2141 commented 4 months ago

Do not use os.system() to execute the train.py script in shell 1. Instead you should use exec(open("./train.py").read()).

Hello, thanks for useful information. I'm struggling to debug CUDA submodules too and finally reached here. My questions are:

How can I pass command line arguments if I use exec(open("./render.py").read()) in shell1 where python shell is open?
How can I set breakpoint to the python process running at shell1 so that it pauses while I set breakpoint on cuda-gdb in shell2?

seanzhuh commented 4 months ago

Do not use os.system() to execute the train.py script in shell 1. Instead you should use exec(open("./train.py").read()).

Hello, thanks for useful information. I'm struggling to debug CUDA submodules too and finally reached here. My questions are:

How can I pass command line arguments if I use exec(open("./render.py").read()) in shell1 where python shell is open?

How can I set breakpoint to the python process running at shell1 so that it pauses while I set breakpoint on cuda-gdb in shell2?

1) I directly modify the default argument value in render.py w/o passing arguments from exec(open("./render.py").read()). 2) Well, you don't need to set breakpoints in shell which runs python script, it will automatically pause as long as you set breakpoints in cuda-gdb before. If you intend to debug python code, I suggest using pdb. Debug both python and cuda may take some time to explore I guess.

weihan1 commented 2 months ago

Hey @seanzhuh, thanks for this detailed explanation. I tried your instructions and noticed i couldn't attach the cuda-gdb to python: cuda-gdb -p , i get “Operation Not Permitted”. Since i'm not admin on the server, I'm not too sure how to solve this, have you encountered this issue initially?

Xiaoxi-Liang commented 18 hours ago

I just managed to set breakpoints using the following setup.py file and run DEBUG=1 python setup.py build develop in case it is helpful to anyone.
from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
os.path.dirname(os.path.abspath(__file__))

setup(
    name="diff_gaussian_rasterization",
    packages=['diff_gaussian_rasterization'],
    ext_modules=[
        CUDAExtension(
            name="diff_gaussian_rasterization._C",
            sources=[
            "cuda_rasterizer/rasterizer_impl.cu",
            "cuda_rasterizer/forward.cu",
            "cuda_rasterizer/backward.cu",
            "rasterize_points.cu",
            "ext.cpp"],
            extra_compile_args={"nvcc": ["-O0", "-Xcompiler", "-fPIC", "-G", "-g", 
                                         "-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")],
                                'cxx': ["-g"]},
            extra_link_args=["-shared"]
            )
        ],
    cmdclass={
        'build_ext': BuildExtension
    }
)
Though set breakpoints is successful, however, when I run os.system('python train.py'), it seems it will create a subprocess that cuda-gdb is not attached to. cuda-gdb shows 'detaching after vfork from child processs 295097' whereas my python process is 294177.
In addition to this, in shell 1, you need to explicitly import diff_gaussian_rasterization in order for cuda-gdb to load the generated .so files. Otherwise, cuda-gdb still can not find it!

Hello! I follow your commands but my cuda-gdb can not find generated.so

Specifically, I rewrite a simply debug.py script only containing render process with all needed arguments saved before for debugging.

Here are my debug.py:

from gaussian_renderer import render_debug
import diff_gaussian_rasterization
path = './debug.pth'
while True:
    results=render_debug(path)
    print('x')

render_debug is a rewritten function in gaussian_renderer,

import torch
import math
from typing import Union
from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer

def render_debug(path):
    debug_dict = torch.load(path)

    # Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
    screenspace_points = torch.zeros_like(debug_dict['means3D'], dtype=debug_dict['means3D'].dtype, requires_grad=True, device="cuda") + 0
    try:
        screenspace_points.retain_grad()
    except:
        pass

    raster_settings = GaussianRasterizationSettings(
        image_height=debug_dict['image_height'],
        image_width=debug_dict['image_width'],
        tanfovx=debug_dict['tanfovx'],
        tanfovy=debug_dict['tanfovy'],
        bg=debug_dict['bg'],
        scale_modifier=debug_dict['scale_modifier'],
        viewmatrix=debug_dict['viewmatrix'],
        projmatrix=debug_dict['projmatrix'],
        sh_degree=debug_dict['sh_degree'],
        campos=debug_dict['campos'],
        prefiltered=False,
        debug=False
    )

    rasterizer = GaussianRasterizer(raster_settings=raster_settings)
    means2D = screenspace_points

    # Rasterize visible Gaussians to image, obtain their radii (on screen). 
    rendered_image, radii, rendered_depth, rendered_alpha  = rasterizer(
        means3D = debug_dict['means3D'],
        means2D = means2D,
        shs = debug_dict['shs'],
        colors_precomp = debug_dict['colors_precomp'],
        opacities = debug_dict['opacities'],
        scales = debug_dict['scales'],
        rotations = debug_dict['rotations'],
        cov3D_precomp =debug_dict['cov3D_precomp'])

    # Those Gaussians that were frustum culled or had a radius of 0 were not visible.
    # They will be excluded from value updates used in the splitting criteria.
    return {"render": rendered_image,
            "viewspace_points": screenspace_points,
            "visibility_filter" : radii > 0,
            "radii": radii}

Here are what I do in two shells:

run python in shell 1 , run import diff_gaussian_rasterization and exec(open("./debug.py").read()) 2.search use ps -aux | grep python and cuda-gdb -p <pid> in shell 2 3.run break forward.cu:497 in shell 2 and still get

(cuda-gdb) break forward.cu:497
No symbol table is loaded.  Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (forward.cu:497) pending.

It seems that my cuda-gdb shell do not align with python process. What is wrong with it? What should I do to check it?

graphdeco-inria / gaussian-splatting

How to debug .cu files #889