graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Other
13.43k stars 1.71k forks source link

How to debug .cu files #889

Open seanzhuh opened 1 month ago

seanzhuh commented 1 month ago

Hi, thanks for your wonderful work and neat cuda implementation with clear file structures.

A starting point to learn 3d gaussian splatting would be to jump into every piece of code and see what they are actually doing. Unfortunately however, the pdb from python does not support debug in cuda/c++ files. I've searched the internet and found a seemingly viable solution.

The solution is that, open two terminals, one is shell1 and the other is shell2. In shell 2, enter python shell, then in shell 1, ps -aux | grep python and run cuda-gdb -p PID to attach to the python process in shell 1. Then breakpoints can be set via break forward.cu:429. After setting breakpoints, run continue in shell 2. Then in shell 1, import os, and os.system('python train.py').

The above pipeline works under projects written in pure cuda. The whole project can be compiled using nvcc -Xcompiler -fPIC -std=c++11 -shared -arch=sm_60 -G -g -o t383.so t383.cu -DFIX for instance.

However, since 3dgs is a mix of python, c++, and cuda. I didn't know how to specify these arguments in setup.py or CMakeLists.txt files. As a result, when set breakpoints using break forward.cu:497, cuda-gdb complains that it can not find the source files, which I guess the problem is in linking stage?

Can someone help with this?

seanzhuh commented 1 month ago

I just managed to set breakpoints using the following setup.py file and run DEBUG=1 python setup.py build develop in case it is helpful to anyone.

from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
os.path.dirname(os.path.abspath(__file__))

setup(
    name="diff_gaussian_rasterization",
    packages=['diff_gaussian_rasterization'],
    ext_modules=[
        CUDAExtension(
            name="diff_gaussian_rasterization._C",
            sources=[
            "cuda_rasterizer/rasterizer_impl.cu",
            "cuda_rasterizer/forward.cu",
            "cuda_rasterizer/backward.cu",
            "rasterize_points.cu",
            "ext.cpp"],
            extra_compile_args={"nvcc": ["-O0", "-Xcompiler", "-fPIC", "-G", "-g", 
                                         "-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")],
                                'cxx': ["-g"]},
            extra_link_args=["-shared"]
            )
        ],
    cmdclass={
        'build_ext': BuildExtension
    }
)

Though set breakpoints is successful, however, when I run os.system('python train.py'), it seems it will create a subprocess that cuda-gdb is not attached to. cuda-gdb shows 'detaching after vfork from child processs 295097' whereas my python process is 294177.

seanzhuh commented 1 month ago

I just managed to set breakpoints using the following setup.py file and run DEBUG=1 python setup.py build develop in case it is helpful to anyone.

from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
os.path.dirname(os.path.abspath(__file__))

setup(
    name="diff_gaussian_rasterization",
    packages=['diff_gaussian_rasterization'],
    ext_modules=[
        CUDAExtension(
            name="diff_gaussian_rasterization._C",
            sources=[
            "cuda_rasterizer/rasterizer_impl.cu",
            "cuda_rasterizer/forward.cu",
            "cuda_rasterizer/backward.cu",
            "rasterize_points.cu",
            "ext.cpp"],
            extra_compile_args={"nvcc": ["-O0", "-Xcompiler", "-fPIC", "-G", "-g", 
                                         "-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")],
                                'cxx': ["-g"]},
            extra_link_args=["-shared"]
            )
        ],
    cmdclass={
        'build_ext': BuildExtension
    }
)

Though set breakpoints is successful, however, when I run os.system('python train.py'), it seems it will create a subprocess that cuda-gdb is not attached to. cuda-gdb shows 'detaching after vfork from child processs 295097' whereas my python process is 294177.

In addition to this, in shell 1, you need to explicitly import diff_gaussian_rasterization in order for cuda-gdb to load the generated .so files. Otherwise, cuda-gdb still can not find it!

seanzhuh commented 1 month ago

Do not use os.system() to execute the train.py script in shell 1. Instead you should use exec(open("./train.py").read()).

redherring2141 commented 1 month ago

Do not use os.system() to execute the train.py script in shell 1. Instead you should use exec(open("./train.py").read()).

Hello, thanks for useful information. I'm struggling to debug CUDA submodules too and finally reached here. My questions are:

  1. How can I pass command line arguments if I use exec(open("./render.py").read()) in shell1 where python shell is open?
  2. How can I set breakpoint to the python process running at shell1 so that it pauses while I set breakpoint on cuda-gdb in shell2?
seanzhuh commented 1 month ago

Do not use os.system() to execute the train.py script in shell 1. Instead you should use exec(open("./train.py").read()).

Hello, thanks for useful information. I'm struggling to debug CUDA submodules too and finally reached here. My questions are:

  1. How can I pass command line arguments if I use exec(open("./render.py").read()) in shell1 where python shell is open?
  2. How can I set breakpoint to the python process running at shell1 so that it pauses while I set breakpoint on cuda-gdb in shell2?

1) I directly modify the default argument value in render.py w/o passing arguments from exec(open("./render.py").read()). 2) Well, you don't need to set breakpoints in shell which runs python script, it will automatically pause as long as you set breakpoints in cuda-gdb before. If you intend to debug python code, I suggest using pdb. Debug both python and cuda may take some time to explore I guess.