k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.08k stars 211 forks source link

k2 source build PyTorch 2.1 + Cuda 12.1 fail. #1252

Closed videodanchik closed 8 months ago

videodanchik commented 9 months ago

I'm building k2 from source with Cuda installed but without GPU attached to the work instance (pure CPU), this enforces k2 build binaries for all possible GPU architectures starting from compute_35 and till compute_90. So I have installed Cuda 12.1 along with PyTorch 2.1.0+cu121 on Ubuntu 22.04 and run:

K2_MAKE_ARGS="-j" K2_CMAKE_ARGS="-DCAFFE2_USE_CUDNN=1 -DCMAKE_BUILD_TYPE=Release" python3 setup.py install

Here is my environment:

-- CMAKE_VERSION: 3.25.1
-- Enabled languages: CXX;CUDA
-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.1.105
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- K2_OS: Ubuntu 22.04.3 LTS
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for C++ include execinfo.h
-- Looking for C++ include execinfo.h - found
-- Performing Test K2_COMPILER_SUPPORTS_CXX14
-- Performing Test K2_COMPILER_SUPPORTS_CXX14 - Success
-- C++ Standard version: 14
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
-- K2_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
-- K2_COMPUTE_ARCH_CANDIDATES 50;60;61;70;75;80;86;90
-- Adding arch 50
-- Adding arch 60
-- Adding arch 61
-- Adding arch 70
-- Adding arch 75
-- Adding arch 80
-- Adding arch 86
-- Skipping arch 90
-- K2_COMPUTE_ARCHS: 50;60;61;70;75;80;86
-- Found Valgrind: /usr/bin  
-- Found Valgrind: /usr/bin/valgrind
-- To check memory, run ctest -R <NAME> -D ExperimentalMemCheck
-- Downloading pybind11 from https://github.com/pybind/pybind11/archive/5bc0943ed96836f46489f53961f6c438d2935357.zip
-- pybind11 is downloaded to /home/videodanchik/k2/build/temp.linux-x86_64-cpython-311/_deps/pybind11-src
-- pybind11 v2.11.0 dev1
-- Found PythonInterp: /home/videodanchik/envs/py311/.venv/bin/python3 (found suitable version "3.11.6", minimum required is "3.6") 
-- Found PythonLibs: /usr/local/lib/libpython3.11.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Python executable: /home/videodanchik/envs/py311/.venv/bin/python3
-- Found CUDA: /usr/local/cuda (found version "12.1") 
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.105") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 12.1
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 12.1
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is b51b459d
-- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so  
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;8.0;8.6;8.9;9.0
-- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90
-- Found Torch: /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/lib/libtorch.so  
-- K2_TORCH_VERSION: 2.1
-- PyTorch version: 2.1.0+cu121
-- PyTorch cuda version: 12.1
-- Generated /home/videodanchik/k2/build/temp.linux-x86_64-cpython-311/torch_version.py
-- Downloading moderngpu from https://github.com/moderngpu/moderngpu/archive/8ec9ac0de8672de7217d014917eedec5317f75f3.zip
-- moderngpu is downloaded to /home/videodanchik/k2/build/temp.linux-x86_64-cpython-311/_deps/moderngpu-src

And here is my error:

[  1%] Building CUDA object k2/csrc/CMakeFiles/k2_log.dir/log.cu.o
[  2%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/feature-fbank.cc.o
[  3%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/feature-functions.cc.o
[  4%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/feature-mfcc.cc.o
[  4%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/feature-plp.cc.o
[  5%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/feature-spectrogram.cc.o
[  6%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/feature-window.cc.o
[  7%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/matrix-functions.cc.o
[  7%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/mel-computations.cc.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
[  8%] Building CXX object _deps/kaldifeat-build/kaldifeat/csrc/CMakeFiles/kaldifeat_core.dir/online-feature.cc.o
make[2]: *** [k2/csrc/CMakeFiles/k2_log.dir/build.make:77: k2/csrc/CMakeFiles/k2_log.dir/log.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1120: k2/csrc/CMakeFiles/k2_log.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
In file included from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/script.h:3,
                 from /home/videodanchik/k2/build/temp.linux-x86_64-cpython-311/_deps/kaldifeat-src/kaldifeat/csrc/feature-functions.h:10,
                 from /home/videodanchik/k2/build/temp.linux-x86_64-cpython-311/_deps/kaldifeat-src/kaldifeat/csrc/feature-functions.cc:7:
/home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
    4 | #error C++17 or later compatible compiler is required to use ATen.
      |  ^~~

Basically there are 2 issues here:

nvcc fatal : Unsupported gpu architecture 'compute_35'

I think this is related to the fact that Cuda 12.1 doesn't support compute_35 anymore (see here).

And the main error itself most probably related to https://github.com/pytorch/pytorch/pull/100557.

csukuangfj commented 9 months ago

Please try #1249

videodanchik commented 9 months ago

Hi @csukuangfj, thanks for quick reply, but I'm still experiencing issues:

1) I see this warning:

/home/videodanchik/k2_builds/k2_gpu/k2/csrc/intersect.cu(889): warning #177-D: variable "num_threads" was declared but never referenced
                num_threads = g.size();
                ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

Probably the variable can be removed

2) The error happens later now:

[ 48%] Building CUDA object k2/csrc/CMakeFiles/context.dir/timer.cu.o
[ 49%] Building CUDA object k2/csrc/CMakeFiles/context.dir/top_sort.cu.o
[ 50%] Building CUDA object k2/csrc/CMakeFiles/context.dir/torch_util.cu.o
In file included from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/pytorch_context.h:26,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/torch_util.h:30,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/torch_util.cu:23:
/home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is required to use PyTorch.
    4 | #error C++17 or later compatible compiler is required to use PyTorch.
      |  ^~~
In file included from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/c10/util/string_view.h:4,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/c10/util/StringUtil.h:6,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/c10/util/Exception.h:5,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/c10/core/Device.h:5,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:11,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/pytorch_context.h:26,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/torch_util.h:30,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/torch_util.cu:23:
/home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/c10/util/C++17.h:27:2: error: #error You need C++17 to compile PyTorch
   27 | #error You need C++17 to compile PyTorch
      |  ^
In file included from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:4,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/all.h:9,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/pytorch_context.h:26,
                 from /home/videodanchik
/k2_builds/k2_gpu/k2/csrc/torch_util.h:30,
                 from /home/videodanchik/k2_builds/k2_gpu/k2/csrc/torch_util.cu:23:
/home/videodanchik/envs/py311/.venv/lib/python3.11/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
    4 | #error C++17 or later compatible compiler is required to use ATen.
      |  ^~~
csukuangfj commented 9 months ago

Please change https://github.com/k2-fsa/k2/blob/44a9d5682af9fd3ef77074777e15278ec6d390eb/CMakeLists.txt#L201 to

 set(CMAKE_CXX_STANDARD 17 CACHE STRING "The C++ version to be used.") 

and delete your build directory and retry.


Note I was using set(CMAKE_CXX_STANDARD 14) in the PR.

videodanchik commented 9 months ago

Thanks @csukuangfj, now the build went without any problem, except few warnings like this:

/home/videodanchik/k2_builds/k2_gpu/k2/csrc/intersect.cu(889): warning #177-D: variable "num_threads" was declared but never referenced
                num_threads = g.size();
                ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/wave_reader.cu(61): warning #177-D: variable "p" was declared but never referenced
        const char *p = reinterpret_cast<const char *>(&subchunk2_id);
                    ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/beam_search.cu: In function ‘std::vector<std::vector<int>, std::allocator<std::vector<int> > > k2::ModifiedBeamSearch(const torch::jit::Module&, const at::Tensor&, const at::Tensor&, int32_t)’:
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/beam_search.cu:272:20: warning: comparison of integer expressions of different signedness: ‘int32_t’ {aka ‘int’} and ‘std::vector<k2::Hypotheses>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  272 |     if (cur_batch_size < cur.size()) {
      |     ~~~~~~~~~~~~~~~^~~~~~~~~~~~
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/beam_search.cu:298:23: warning: comparison of integer expressions of different signedness: ‘int32_t’ {aka ‘int’} and ‘std::vector<k2::Hypothesis>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  298 |     for (int32_t k = 0; k != prev.size(); ++k) {
      |                     ^~~~~~~~~~~~
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu: In function ‘void k2::_GLOBAL__N__f882773b_18_deserialization_cu_4aae5ec1::restoreAccurateTypeTags(const c10::IValue&, const TypePtr&)’:
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘StorageType’ not handled in switch [-Wswitch]
  101 |     switch (w.static_type->kind()) {
      |        ^
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘ComplexType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘AwaitType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘StreamObjType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘MemoryFormatType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘SymIntType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘SymFloatType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘SymBoolType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘UnionType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/csrc/deserialization.cu:101:8: warning: enumeration value ‘DynamicType’ not handled in switch [-Wswitch]
/home/videodanchik/k2_builds/k2_gpu/k2/torch/bin/attention_rescore.cu: In function ‘int main(int, char**)’:
/home/videodanchik/k2_builds/k2_gpu/k2/torch/bin/attention_rescore.cu:288:17: warning: loop variable ‘tids’ creates a copy from type ‘const std::vector<int>’ [-Wrange-loop-construct]
  288 |   for (const auto tids : token_ids) {
      |                 ^~~~
/home/videodanchik/k2_builds/k2_gpu/k2/torch/bin/attention_rescore.cu:288:17: note: use reference type to prevent copying
  288 |   for (const auto tids : token_ids) {
      |                 ^~~~
      |                 &

Thanks for help!

csukuangfj commented 8 months ago

Closing via #1249

You can find a list of pre-built wheels at