Open leoauri opened 3 weeks ago
I ran the installer again and there were not the same deprecation warnings, the output was:
$ python3.10 setup.py install 2>&1 | tee install.log
running install
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` directly.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!!
self.initialize_options()
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` and ``easy_install``.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!!
self.initialize_options()
running bdist_egg
running egg_info
writing structured_kernels.egg-info/PKG-INFO
writing dependency_links to structured_kernels.egg-info/dependency_links.txt
writing top-level names to structured_kernels.egg-info/top_level.txt
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:499: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'structured_kernels.egg-info/SOURCES.txt'
writing manifest file 'structured_kernels.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:418: UserWarning: The detected CUDA version (12.3) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.3
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/structured_kernels.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for structured_kernels.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/structured_kernels.py to structured_kernels.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying structured_kernels.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.structured_kernels.cpython-310: module references __file__
creating 'dist/structured_kernels-0.1.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing structured_kernels-0.1.0-py3.10-linux-x86_64.egg
removing '/usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg' (and everything under it)
creating /usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg
Extracting structured_kernels-0.1.0-py3.10-linux-x86_64.egg to /usr/local/lib/python3.10/dist-packages
Adding structured-kernels 0.1.0 to easy-install.pth file
Installed /usr/local/lib/python3.10/dist-packages/structured_kernels-0.1.0-py3.10-linux-x86_64.egg
Processing dependencies for structured-kernels==0.1.0
Finished processing dependencies for structured-kernels==0.1.0
In particular the line There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.3
jumps out at me, could be something to do with it?
Ah wait. The compilation job and the training job landed on different machines in the cluster with different GPU models. Probably the kernel has to be compiled for the actual GPU it will be used with...
Yes, it has to be compiled for the specific GPU. Sometimes there can be issues with versions managed in a cluster because of this. I recommend trying to create a separate environment for each machine type.
Hi there, I have copied s4.py and the kernel extension into another repository I am working on. I had S4 components running (with CUDA), and then I installed the kernel extensions. The build output was full of deprecation warnings so filled my terminal history, but ends with
Also, I remember at the beginning some warnings because CUDA version is 12.3 but pytorch is built for 12.1...
In any case, when I now try to train with the S4 components, I get an error like:
Any idea how to work around this? Thanks...