state-spaces / s4

Structured state space sequence models
Apache License 2.0
2.25k stars 277 forks source link

CUDA error: no kernel image is available for execution on the device #147

Open leoauri opened 3 weeks ago

leoauri commented 3 weeks ago

Hi there, I have copied s4.py and the kernel extension into another repository I am working on. I had S4 components running (with CUDA), and then I installed the kernel extensions. The build output was full of deprecation warnings so filled my terminal history, but ends with

...
creating build/lib.linux-x86_64-cpython-310
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 build/temp.linux-x86_64-cpython-310/cauchy.o build/temp.linux-x86_64-cpython-310/cauchy_cuda.o -L/usr/local/lib/python3.10/dist-packages/torch/lib -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/structured_kernels.cpython-310-x86_64-linux-gnu.so
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/structured_kernels.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for structured_kernels.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/structured_kernels.py to structured_kernels.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.structured_kernels.cpython-310: module references __file__
creating 'dist/structured_kernels-0.1.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing structured_kernels-0.1.0-py3.10-linux-x86_64.egg
creating /usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg
Extracting structured_kernels-0.1.0-py3.10-linux-x86_64.egg to /usr/local/lib/python3.10/dist-packages
Adding structured-kernels 0.1.0 to easy-install.pth file

Installed /usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg
Processing dependencies for structured-kernels==0.1.0
Finished processing dependencies for structured-kernels==0.1.0

Also, I remember at the beginning some warnings because CUDA version is 12.3 but pytorch is built for 12.1...

In any case, when I now try to train with the S4 components, I get an error like:

...
  File "/workspace/cornbirdrave/RAVE/extensions/kernels/cauchy.py", line 96, in forward
    return cauchy_mult_sym_fwd(v, z, w)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Any idea how to work around this? Thanks...

leoauri commented 3 weeks ago

I ran the installer again and there were not the same deprecation warnings, the output was:

$ python3.10 setup.py install 2>&1 | tee install.log
running install
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing structured_kernels.egg-info/PKG-INFO
writing dependency_links to structured_kernels.egg-info/dependency_links.txt
writing top-level names to structured_kernels.egg-info/top_level.txt
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:499: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'structured_kernels.egg-info/SOURCES.txt'
writing manifest file 'structured_kernels.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:418: UserWarning: The detected CUDA version (12.3) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
  warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.3
  warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/structured_kernels.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for structured_kernels.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/structured_kernels.py to structured_kernels.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.structured_kernels.cpython-310: module references __file__
creating 'dist/structured_kernels-0.1.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing structured_kernels-0.1.0-py3.10-linux-x86_64.egg
removing '/usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg' (and everything under it)
creating /usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg
Extracting structured_kernels-0.1.0-py3.10-linux-x86_64.egg to /usr/local/lib/python3.10/dist-packages
Adding structured-kernels 0.1.0 to easy-install.pth file

Installed /usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg
Processing dependencies for structured-kernels==0.1.0
Finished processing dependencies for structured-kernels==0.1.0

In particular the line There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.3 jumps out at me, could be something to do with it?

leoauri commented 2 weeks ago

Ah wait. The compilation job and the training job landed on different machines in the cluster with different GPU models. Probably the kernel has to be compiled for the actual GPU it will be used with...

albertfgu commented 2 weeks ago

Yes, it has to be compiled for the specific GPU. Sometimes there can be issues with versions managed in a cluster because of this. I recommend trying to create a separate environment for each machine type.