nicknytko / numml

MIT License
12 stars 2 forks source link

Build issues #11

Open GregorySchwing opened 9 months ago

GregorySchwing commented 9 months ago

Environment: runDocker.txt

Docker Image nvcr.io/nvidia/pytorch:23.09-py3
Driver Version: 525.125.06   CUDA Version: 12.0 
Torch Version: 2.1.0a0+32f93b1

Output:

root@dd25c5516393:/TorchVision/numml# pip3 install . Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Processing /TorchVision/numml Preparing metadata (setup.py) ... done Building wheels for collected packages: numml Building wheel for numml (setup.py) ... error error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [375 lines of output]
      Detected CUDA, compiling with CUDA acceleration...
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.10
      creating build/lib.linux-x86_64-3.10/numml
      copying numml/krylov.py -> build/lib.linux-x86_64-3.10/numml
      copying numml/__init__.py -> build/lib.linux-x86_64-3.10/numml
      copying numml/profiler.py -> build/lib.linux-x86_64-3.10/numml
      copying numml/iterative.py -> build/lib.linux-x86_64-3.10/numml
      copying numml/utils.py -> build/lib.linux-x86_64-3.10/numml
      copying numml/nn.py -> build/lib.linux-x86_64-3.10/numml
      copying numml/autograd.py -> build/lib.linux-x86_64-3.10/numml
      creating build/lib.linux-x86_64-3.10/numml/sparse
      copying numml/sparse/_linear_operator.py -> build/lib.linux-x86_64-3.10/numml/sparse
      copying numml/sparse/__init__.py -> build/lib.linux-x86_64-3.10/numml/sparse
      copying numml/sparse/linalg.py -> build/lib.linux-x86_64-3.10/numml/sparse
      copying numml/sparse/_csr.py -> build/lib.linux-x86_64-3.10/numml/sparse
      running build_ext
      building 'numml_torch_cpp' extension
      creating /TorchVision/numml/build/temp.linux-x86_64-3.10
      creating /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp
      Emitting ninja build file /TorchVision/numml/build/temp.linux-x86_64-3.10/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/5] c++ -MMD -MF /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/TorchVision/numml/ext/cuCollections/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /TorchVision/numml/cpp/sparse_csr.cpp -o /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr.o -DCUDA_ENABLED=1 -O2 -std=c++17 -w -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=numml_torch_cpp -D_GLIBCXX_USE_CXX11_ABI=1
      [2/5] c++ -MMD -MF /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr_cpu.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/TorchVision/numml/ext/cuCollections/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /TorchVision/numml/cpp/sparse_csr_cpu.cpp -o /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr_cpu.o -DCUDA_ENABLED=1 -O2 -std=c++17 -w -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=numml_torch_cpp -D_GLIBCXX_USE_CXX11_ABI=1
      [3/5] /usr/local/cuda/bin/nvcc  -I/TorchVision/numml/ext/cuCollections/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /TorchVision/numml/cpp/sparse_csr_cuda.cu -o /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=numml_torch_cpp -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90
      FAILED: /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr_cuda.o
      /usr/local/cuda/bin/nvcc  -I/TorchVision/numml/ext/cuCollections/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /TorchVision/numml/cpp/sparse_csr_cuda.cu -o /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=numml_torch_cpp -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90
      /TorchVision/numml/cpp/sparse_csr_cuda.cu(123): error: no instance of overloaded function "atomicAdd" matches the argument list
                  argument types are: (double *, double)
                atomicAdd(grad_x + i, A_data[k_i] * grad_z[k]);
                ^
                detected during instantiation of "void spgemv_backward_cuda_kernel_grad_x_atomic(int, int, at::PackedTensorAccessor64<scalar_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<scalar_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, scalar_t *) [with scalar_t=double]" at line 147

      1 error detected in the compilation of "/TorchVision/numml/cpp/sparse_csr_cuda.cu".
      [4/5] /usr/local/cuda/bin/nvcc  -I/TorchVision/numml/ext/cuCollections/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /TorchVision/numml/cpp/cuda_common.cu -o /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/cuda_common.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=numml_torch_cpp -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90
      /TorchVision/numml/cpp/cuda_common.cu: In lambda function:
      /TorchVision/numml/cpp/cuda_common.cu:22:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
         22 |     AT_DISPATCH_FLOATING_TYPES(Bhat_V.type(), "lexsort_coo_ijv", [&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/cuda_common.cu:22:152: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
         22 |     AT_DISPATCH_FLOATING_TYPES(Bhat_V.type(), "lexsort_coo_ijv", [&] {
            |                                                                                                                                                        ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      /TorchVision/numml/cpp/cuda_common.cu: In lambda function:
      /TorchVision/numml/cpp/cuda_common.cu:42:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
         42 |     AT_DISPATCH_FLOATING_TYPES(Bhat_V.type(), "lexsort_coo_ijv", [&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/cuda_common.cu:42:152: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
         42 |     AT_DISPATCH_FLOATING_TYPES(Bhat_V.type(), "lexsort_coo_ijv", [&] {
            |                                                                                                                                                        ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      [5/5] /usr/local/cuda/bin/nvcc  -I/TorchVision/numml/ext/cuCollections/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu -o /TorchVision/numml/build/temp.linux-x86_64-3.10/cpp/sparse_csr_gemm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=numml_torch_cpp -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(228): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(392): warning #177-D: variable "j_i" was declared but never referenced
            int64_t k_i, j_i;
                         ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(666): warning #177-D: variable "C_rows" was declared but never referenced
            const int C_rows = A_rows;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu(667): warning #177-D: variable "C_cols" was declared but never referenced
            const int C_cols = B_cols;
                      ^

      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu: In lambda function:
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:264:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
        264 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_forward_cuda", ([&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:264:156: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
        264 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_forward_cuda", ([&] {
            |                                                                                                                                                            ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu: In lambda function:
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:293:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
        293 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_forward_cuda", [&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:293:156: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
        293 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_forward_cuda", [&] {
            |                                                                                                                                                            ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu: In lambda function:
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:701:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
        701 |     AT_DISPATCH_FLOATING_TYPES(grad_C.type(), "spgemm_backward_cuda", ([&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:701:157: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
        701 |     AT_DISPATCH_FLOATING_TYPES(grad_C.type(), "spgemm_backward_cuda", ([&] {
            |                                                                                                                                                             ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu: In lambda function:
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:725:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
        725 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_backward_cuda", [&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:725:157: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
        725 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_backward_cuda", [&] {
            |                                                                                                                                                             ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu: In lambda function:
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:757:43: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
        757 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_backward_cuda", ([&] {
            |                              ~~~~~~~~~~~~~^~
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/TensorBody.h:225:1: note: declared here
        225 |   DeprecatedTypeProperties & type() const {
            | ^ ~~
      /TorchVision/numml/cpp/sparse_csr_gemm_cuda.cu:757:157: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
        757 |     AT_DISPATCH_FLOATING_TYPES(A_data.type(), "spgemm_backward_cuda", ([&] {
            |                                                                                                                                                             ^
      /usr/local/lib/python3.10/dist-packages/torch/include/ATen/Dispatch.h:109:1: note: declared here
        109 | inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties& t) {
            | ^~~~~~~~~~~
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/TorchVision/numml/setup.py", line 40, in <module>
          setup(name='numml',
        File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 103, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
          super().run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 364, in run
          self.run_command("build")
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
          super().run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command
          super().run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 88, in run
          _build_ext.run(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 865, in build_extensions
          build_ext.build_extensions(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
          self._build_extensions_serial()
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
          self.build_extension(ext)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 249, in build_extension
          _build_ext.build_extension(self, ext)
        File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 127, in build_extension
          super(build_ext, self).build_extension(ext)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
          objects = self.compiler.compile(sources,
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 678, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1590, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1933, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for numml
  Running setup.py clean for numml
Failed to build numml
ERROR: Could not build wheels for numml, which is required to install pyproject.toml-based projects
GregorySchwing commented 9 months ago

Related:

https://stackoverflow.com/questions/37566987/cuda-atomicadd-for-doubles-definition-error

GregorySchwing commented 9 months ago

Switched to a new branch 'development' root@51e1485d94b7:/TorchVision/numml# pip3 install . Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Processing /TorchVision/numml Preparing metadata (setup.py) ... done Building wheels for collected packages: numml Building wheel for numml (setup.py) ... done Created wheel for numml: filename=numml-0.0.1-cp310-cp310-linux_x86_64.whl size=5749528 sha256=ed6878487bd4d6bd9e639f2dba8189da1372e9d9e41c1fb5176fa74272136d7a Stored in directory: /tmp/pip-ephem-wheel-cache-lc6pf5xf/wheels/93/71/29/b47e7c308564fbb48bbe2f80c9864b281c5d97be8436b044b1 Successfully built numml Installing collected packages: numml Successfully installed numml-0.0.1

Fixed in #12