Open GugaGugaGuga opened 2 years ago
It should work together with cudatoolkit v10.1. Try this inside your Python environment
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
It should work together with cudatoolkit v10.1. Try this inside your Python environment
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
I don't have anaconda, I only have python3.8 under ubuntu, all the previous operations ran through and only this sentence is a problem. Can I have any other options?
It should work together with cudatoolkit v10.1. Try this inside your Python environment
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
I don't have anaconda, I only have python3.8 under ubuntu, all the previous operations ran through and only this sentence is a problem. Can I have any other options?
wjy@wjy:~/Documents/ge-spmm-master/pytorch-custom$ python3.8 Python 3.8.12 (default, Sep 10 2021, 00:16:05) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import torch print(torch.version) 1.4.0 torch.cuda.is_available() True
Testing the successful installation, why did this happen when running the gcn_custom_2layer.py file?Please ask for help.
Hi @GugaGugaGuga, the error occurs because cusparse in cuda11 and cuda10 have different APIs, so what matters is CUDA Toolkit version. Can you try print(torch.version.cuda)
in python and see if the output is 10.x?
It should work together with cudatoolkit v10.1. Try this inside your Python environment
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
Hi @GugaGugaGuga, the error occurs because cusparse in cuda11 and cuda10 have different APIs, so what matters is CUDA Toolkit version. Can you try
print(torch.version.cuda)
in python and see if the output is 10.x?
Yes,the output is 10.1. I know cuda11 and cuda10 have different APIs
But your version of the code uses cuda10.1.Is there any problem with cuda10.1?
Sorry, the torch.version.cuda
does not matter. The compilation of the shared library spmm.so is through your system's default cuda. If your default nvcc is >= 11, there would be a problem. First check if your system cuda is correct version through nvcc --version
. To further rule out problems, can you share the output when you execute the script, in particular we need things like
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/guyue/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/guyue/ge-spmm/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
when the shared library is jit-compiled. In my case it is through /usr/local/cuda-10.1/bin/nvcc
which works fine.
Note that you may need to clean the compilation cache and run again to see this logging, in your case you need to delete your folder /tmp/torch_extensions/spmm
if it's still there.
Sorry, the
torch.version.cuda
does not matter. The compilation of the shared library spmm.so is through your system's default cuda. If your default nvcc is >= 11, there would be a problem. First check if your system cuda is correct version throughnvcc --version
. To further rule out problems, can you share the output when you execute the script, in particular we need things likeAllowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/guyue/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/guyue/ge-spmm/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
when the shared library is jit-compiled. In my case it is through
/usr/local/cuda-10.1/bin/nvcc
which works fine.Note that you may need to clean the compilation cache and run again to see this logging, in your case you need to delete your folder
/tmp/torch_extensions/spmm
if it's still there.
wjy@wjy:~/Documents/ge-spmm-master/pytorch-custom$ python3.8 gcn_custom.py --n-hidden=32
Using /tmp/torch_extensions as PyTorch extensions root...
Creating extension directory /tmp/torch_extensions/spmm...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja...
Building extension module spmm...
[1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
[3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so
Loading extension module spmm...
Traceback (most recent call last):
File "gcn_custom.py", line 9, in <module>
from op import GCNConv
File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module>
spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True)
File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load
return _jit_compile(
File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/usr/lib/python3.8/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2I was
wjy@wjy:~/Downloads$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
Listening to your answer, I was deleted folder /tmp/torch_extensions/spmm, and is still the case.
Sorry, the
torch.version.cuda
does not matter. The compilation of the shared library spmm.so is through your system's default cuda. If your default nvcc is >= 11, there would be a problem. First check if your system cuda is correct version throughnvcc --version
. To further rule out problems, can you share the output when you execute the script, in particular we need things likeAllowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/guyue/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/guyue/ge-spmm/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
when the shared library is jit-compiled. In my case it is through
/usr/local/cuda-10.1/bin/nvcc
which works fine. Note that you may need to clean the compilation cache and run again to see this logging, in your case you need to delete your folder/tmp/torch_extensions/spmm
if it's still there.wjy@wjy:~/Documents/ge-spmm-master/pytorch-custom$ python3.8 gcn_custom.py --n-hidden=32 Using /tmp/torch_extensions as PyTorch extensions root... Creating extension directory /tmp/torch_extensions/spmm... Detected CUDA files, patching ldflags Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja... Building extension module spmm... [1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o [2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o [3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so Loading extension module spmm... Traceback (most recent call last): File "gcn_custom.py", line 9, in <module> from op import GCNConv File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module> spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True) File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load return _jit_compile( File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/usr/lib/python3.8/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic return _load(spec) ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2I was
wjy@wjy:~/Downloads$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105
Listening to your answer, I was deleted folder /tmp/torch_extensions/spmm, and is still the case.
I cannot reproduce the error... Is your LD_LIBRARY_PATH including your /usr/local/cuda-10.1/lib64 ?
export PATH="/usr/local/cuda-10.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH"
export CUDA_HOME="/usr/local/cuda-10.1"
Yes
export PATH="/usr/local/cuda-10.1/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH" export CUDA_HOME="/usr/local/cuda-10.1"
Yes
Since I cannot reproduce the environment problem, I suggest you use a docker image that I test fine. This is the easiest way. The image pytorch/pytorch:1.4-cuda10.1-cudnn7-devel will work for this repo.
When I run
"python3.8 gcn_custom_2layer.py --n-hidden=32"
, the following situation occurred:Using /tmp/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja... Building extension module spmm... [1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o [2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o [3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so Loading extension module spmm... Traceback (most recent call last): File "gcn_custom_2layer.py", line 9, in <module> from op import GCNConv File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module> spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True) File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load return _jit_compile( File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/usr/lib/python3.8/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic return _load(spec) ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2
Please help me how to run through next.
Do I need to download cudnn, and if so, what version?
When I run
"python3.8 gcn_custom_2layer.py --n-hidden=32"
, the following situation occurred:Using /tmp/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja... Building extension module spmm... [1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o [2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o [3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so Loading extension module spmm... Traceback (most recent call last): File "gcn_custom_2layer.py", line 9, in <module> from op import GCNConv File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module> spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True) File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load return _jit_compile( File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/usr/lib/python3.8/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic return _load(spec) ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2
Please help me how to run through next.
Do I need to download cudnn, and if so, what version?
The code does not depend on cudnn, only cusparse, which comes together with cuda toolkit (we need version <= 10.1). Again I suggest using docker to solve environment problem, and pytorch/pytorch:1.4-cuda10.1-cudnn7-devel image should work.
When I run
"python3.8 gcn_custom_2layer.py --n-hidden=32"
, the following situation occurred:Please help me how to run through next.