microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.84k stars 4.05k forks source link

Cifar-10 example - RuntimeError: Error building extension 'fused_adam' #694

Closed Axe-- closed 1 year ago

Axe-- commented 3 years ago

Hey, I was trying out the cifar-10 tutorial (link).
Could you assist with the runtime error.

On executing (run_ds.sh):


(dspeed) axe@axe-H270-Gaming-3:~/Downloads/DeepSpeedExamples/cifar$ sh run_ds.sh
[2021-01-26 05:43:56,524] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2021-01-26 05:43:56,554] [INFO] [runner.py:355:main] cmd = /home/axe/VirtualEnvs/dspeed/bin/python3.6 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json
[2021-01-26 05:43:56,972] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2021-01-26 05:43:56,972] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=2, node_rank=0
[2021-01-26 05:43:56,972] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2021-01-26 05:43:56,972] [INFO] [launch.py:100:main] dist_world_size=2
[2021-01-26 05:43:56,973] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
0it [00:00, ?it/s]Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
 99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉  | 168140800/170498071 [00:07<00:00, 28603271.23it/s]Extracting ./data/cifar-10-python.tar.gz to ./data
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Files already downloaded and verified
170500096it [00:10, 16970356.67it/s]                                                                                                                                                                          
170500096it [00:10, 16911123.86it/s]                                                                                                                                                                          
horse plane   cat  bird
[2021-01-26 05:44:13,334] [INFO] [logging.py:60:log_dist] [Rank -1] DeepSpeed info: version=0.3.10, git-hash=unknown, git-branch=unknown
[2021-01-26 05:44:13,335] [INFO] [distributed.py:40:init_distributed] Initializing torch distributed with backend: nccl
truck horse  ship  ship
[2021-01-26 05:44:14,857] [INFO] [logging.py:60:log_dist] [Rank -1] DeepSpeed info: version=0.3.10, git-hash=unknown, git-branch=unknown
[2021-01-26 05:44:14,857] [INFO] [distributed.py:40:init_distributed] Initializing torch distributed with backend: nccl
[2021-01-26 05:44:18,027] [INFO] [engine.py:72:_initialize_parameter_parallel_groups] data_parallel_size: 2, parameter_parallel_size: 2
[2021-01-26 05:44:18,028] [INFO] [engine.py:72:_initialize_parameter_parallel_groups] data_parallel_size: 2, parameter_parallel_size: 2
Using /home/axe/.cache/torch_extensions as PyTorch extensions root...
Using /home/axe/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/axe/.cache/torch_extensions/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda_10_1_7_6/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/TH -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda_10_1_7_6/include -isystem /home/axe/VirtualEnvs/dspeed/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -std=c++14 -c /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
FAILED: multi_tensor_adam.cuda.o 
/usr/local/cuda_10_1_7_6/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/TH -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda_10_1_7_6/include -isystem /home/axe/VirtualEnvs/dspeed/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -std=c++14 -c /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6688:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’ without object
       __p->_M_set_sharable();
       ~~~~~~~~~^~
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/csrc/includes -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/TH -isystem /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda_10_1_7_6/include -isystem /home/axe/VirtualEnvs/dspeed/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1539, in _run_ninja_build
    env=env)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cifar10_deepspeed.py", line 144, in <module>
    args=args, model=net, model_parameters=parameters, training_data=trainset)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/__init__.py", line 119, in initialize
    config_params=config_params)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 171, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 514, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 583, in _configure_basic_optimizer
    optimizer = FusedAdam(model_parameters, **optimizer_parameters)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
    return self.jit_load(verbose)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 216, in jit_load
    verbose=verbose)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 997, in load
    keep_intermediates=keep_intermediates)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1202, in _jit_compile
    with_cuda=with_cuda)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1300, in _write_ninja_file_and_build_library
    error_prefix="Error building extension '{}'".format(name))
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'         # *******************************************************
Loading extension module fused_adam...
Traceback (most recent call last):
  File "cifar10_deepspeed.py", line 144, in <module>
    args=args, model=net, model_parameters=parameters, training_data=trainset)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/__init__.py", line 119, in initialize
    config_params=config_params)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 171, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 514, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 583, in _configure_basic_optimizer
    optimizer = FusedAdam(model_parameters, **optimizer_parameters)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
    return self.jit_load(verbose)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 216, in jit_load
    verbose=verbose)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 997, in load
    keep_intermediates=keep_intermediates)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1213, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1560, in _import_module_from_library
    file, path, description = imp.find_module(module_name, [path])
  File "/home/axe/VirtualEnvs/dspeed/lib/python3.6/imp.py", line 297, in find_module
    raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'fused_adam'    # *******************************************************

Here's _dsreport:

(dspeed) axe@axe-H270-Gaming-3:~/Downloads/DeepSpeedExamples/cifar$ ds_report 
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
 [WARNING]  sparse_attn requires the 'cmake' command, but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1+cu101
torch cuda version ............... 10.1
nvcc version ..................... 10.1
deepspeed install path ........... ['/home/axe/VirtualEnvs/dspeed/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.3.10, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.1

Running with CUDA 10.1 on Ubuntu 18/04. Here's the virtual environment:

(dspeed) axe@axe-H270-Gaming-3:~/Downloads/DeepSpeedExamples/cifar$ pip freeze
cycler==0.10.0
dataclasses==0.8
deepspeed==0.3.10
kiwisolver==1.3.1
matplotlib==3.3.3
ninja==1.10.0.post2
numpy==1.19.5
Pillow==8.1.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
six==1.15.0
tensorboardX==1.8
torch==1.7.1+cu101
torchaudio==0.7.2
torchvision==0.8.2+cu101
tqdm==4.56.0
typing-extensions==3.7.4.3
TevenLeScao commented 3 years ago

Hello,

For the record I am currently having the same issue with CUDA 10.1 / Ubuntu 18.04 / torch 1.7.1 !

TevenLeScao commented 3 years ago

I used the trick of changing -v to --version in cpp_extension.py as given here. But fused_adam still can't be found:

Traceback (most recent call last):
  File "run_clm_scaling.py", line 400, in <module>
    main()
  File "run_clm_scaling.py", line 359, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/teven/dev_transformers/transformers/transformers_perso/transformers/src/transformers/trainer.py", line 763, in train
    model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
  File "/home/teven/dev_transformers/transformers/transformers_perso/transformers/src/transformers/integrations.py", line 405, in init_deepspeed
    config_params=config,
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/__init__.py", line 119, in initialize
    config_params=config_params)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 171, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 514, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 583, in _configure_basic_optimizer
    optimizer = FusedAdam(model_parameters, **optimizer_parameters)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
    return self.jit_load(verbose)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 216, in jit_load
    verbose=verbose)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 997, in load
    keep_intermediates=keep_intermediates)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1213, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1560, in _import_module_from_library
    file, path, description = imp.find_module(module_name, [path])
  File "/usr/lib/python3.6/imp.py", line 297, in find_module
    raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'fused_adam'
tjruwase commented 3 years ago

@Axe-- and @TevenLeScao , sorry that you are having this issue. Unfortunately, I was unable to repro the problem on my side. I have tried to recreate your environment as best possible, please review further below in case I missed a config. So my suggestion to further debug is to build fused_adam during installation instead of JIT. To do this you will need to clone and build DeepSpeed. Specifically, you want to uninstall and build DeepSpeed with the following two commands.

1. pip uninstall deepspeed -y

2. DS_BUILD_FUSED_ADAM=1 bash install.sh -s

Please let me know how it goes. Thanks!

Below is my environment when I installed in JIT-mode in an attempt to repro the issue.

cat /etc/issue
Ubuntu 18.04.3 LTS \n \l
ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.6/dist-packages/torch']
torch version .................... 1.7.1+cu101
torch cuda version ............... 10.1
nvcc version ..................... 10.1
deepspeed install path ........... ['/usr/local/lib/python3.6/dist-packages/deepspeed']
deepspeed info ................... 0.3.10+5e522ef, 5e522ef, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.1
TevenLeScao commented 3 years ago

I had issues with installation and was following the idea in https://github.com/microsoft/DeepSpeed/issues/629#issuecomment-753993124 to change CUDA from 10.1.105 to 10.1.243 and ended up installing 10.2 instead, which fixed this issue.

Sorry, I won't have time to revert to 10.1 to look for the underlying cause, but in any case, that should be an easy fix in the meantime.

tjruwase commented 3 years ago

@TevenLeScao, no worries about reverting to 10.1. I am glad you are unblocked, which is the most important thing. From your description it seems the underlying issue is a mismatch in the cuda versions of torch and another component, probably deepspeed.

Can you please share the result of ds_report on your working setup? Thanks.

TevenLeScao commented 3 years ago

There it is:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 10.2
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.3.10, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.2
Axe-- commented 3 years ago

Hey, switching to Cuda 10.2 solves this indeed! Thanks! You may close this issue. :-)

windspirit95 commented 3 years ago

In my case, the same issue happened even after I update cuda to version 10.1.243, and I could not update CUDA 10.2 as my Ubuntu is 14.04 I found that my issue caused by the old version of GCC (4.8). I follow this solution to update GCC 6 and problem solved: https://gist.github.com/application2000/73fd6f4bf1be6600a2cf9f56315a2d91 Hope this help someone ^^

tjruwase commented 3 years ago

Closing this issue, since it is resolved. Please reopen if needed.

sayakpaul commented 3 years ago

@tjruwase I also running into something similar:

ImportError: No module named 'fused_adam'

Here are additional details:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
 [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing.
async_io ............... [NO] ....... [NO]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/jupyter/.local/lib/python3.7/site-packages/torch']
torch version .................... 1.7.1+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/opt/conda/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.3.16, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.6, cuda 10.2

Could you share any pointers to resolve this?

tjruwase commented 3 years ago

@sayakpaul, your ds_report shows a mismatch in cuda versions. Your deepspeed wheel is built with 10.2, while your cuda installation is 11.0. Can you try building deepspeed from source so that it is compiled with your installed 11.0?

sayakpaul commented 3 years ago

Sure. Let me do that and get back.

sayakpaul commented 3 years ago

@tjruwase here's my ds_report now:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/jupyter/.local/lib/python3.7/site-packages/torch']
torch version .................... 1.7.1+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/opt/conda/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.4.0+11e94e6, 11e94e6, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

I see that it says fused_adam is not installed yet. After cloning the repo, I ran ./install.sh. DId I miss out on something?

sayakpaul commented 3 years ago

Turned out ninja wasn't installed properly. I followed this suggestion to install ninja to allow PyTorch to load the C++ extensions and things should work now.

ShoufaChen commented 2 years ago

Met this issue when using gcc 4.8. Update the GCC version and reinstall deepspeed by:

pip uninstall deepspeed -y
pip install deepspeed
chintan-donda commented 1 year ago

I'm facing the similar issue.

Here is the ds_report:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0+7d1a95b046), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/users/chintan/venv/lib/python3.8/site-packages/torch']
torch version .................... 2.1.0.dev20230516+cu117
deepspeed install path ........... ['/home/users/chintan/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.2, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 10.0
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

pip freeze:

accelerate==0.19.0
aiohttp==3.8.4
aiosignal==1.3.1
async-timeout==4.0.2
attrs==22.2.0
awscli==1.27.133
bitsandbytes==0.38.1
boto3==1.26.133
botocore==1.29.133
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.3
cloudpickle==2.2.1
cmake==3.25.0
colorama==0.4.4
contextlib2==21.6.0
coverage==7.2.5
dataclasses-json==0.5.7
datasets==2.12.0
deepspeed==0.9.2
dill==0.3.6
docutils==0.16
filelock==3.9.0
frozenlist==1.3.3
fsspec==2023.4.0
google-pasta==0.2.0
greenlet==2.0.2
hjson==3.1.0
huggingface-hub==0.14.1
idna==3.4
importlib-metadata==4.13.0   
importlib-resources==5.12.0  
iniconfig==2.0.0
Jinja2==3.1.2
jmespath==1.0.1
jsonschema==4.17.3
langchain==0.0.165
lit==15.0.7
MarkupSafe==2.1.2
marshmallow==3.19.0
marshmallow-enum==1.5.1
mpmath==1.2.1
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
networkx==3.0rc1
ninja==1.11.1
numexpr==2.8.4
numpy==1.24.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96  
openapi-schema-pydantic==1.2.4
packaging==23.1
pandas==2.0.1
pathos==0.3.0
Pillow==9.3.0
pkgutil_resolve_name==1.3.10 
platformdirs==3.5.1
pluggy==1.0.0
pox==0.3.2
ppft==1.7.6.6
protobuf==3.20.3
protobuf3-to-dict==0.1.5
psutil==5.9.5
py==1.11.0
py-cpuinfo==9.0.0
pyarrow==12.0.0
pyasn1==0.5.0
pydantic==1.10.7
pyrsistent==0.19.3
pytest==7.1.2
pytest-cov==3.0.0
python-dateutil==2.8.2
pytorch-triton==2.1.0+7d1a95b046
pytz==2023.3
PyYAML==5.4.1
regex==2023.5.5
requests==2.28.1
responses==0.18.0
rsa==4.7.2
s3transfer==0.6.1
sagemaker==2.154.0
schema==0.7.5
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
smdebug-rulesconfig==1.0.1
SQLAlchemy==2.0.13
sympy==1.11.1
tblib==1.7.0
tenacity==8.2.2
tensorboardX==2.6
tokenizers==0.13.3
tomli==2.0.1
torch==2.1.0.dev20230516+cu117
torchaudio==2.1.0.dev20230516+cu117
torchvision==0.16.0.dev20230516+cu117
tqdm==4.65.0
transformers @ git+https://github.com/huggingface/transformers.git@d765717c76026281f2fb27ddc44fa3636306bb48
triton==2.0.0
typing-inspect==0.8.0
typing_extensions==4.4.0
tzdata==2023.3
urllib3==1.26.13
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0

NVCC version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

nvidia-smi details:

Wed May 17 15:37:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+

Any help pls?

loadams commented 1 year ago

Closing this issue as the original issue is resolved and any new users who encounter issues here should open a new issue and link this one and we would be happy to take a look.

Excelsiorl commented 9 months ago

I'm facing the similar issue. Could you assist with the runtime error.

here is the log:

(LLM) [liuyuming@gpu7 ChatGLM-Finetuning-master]$ bash pt2.sh 
[2023-11-30 19:34:50,051] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0
[2023-11-30 19:35:05,425] [INFO] [runner.py:540:main] cmd = /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=1024 --enable_each_rank_log=None train.py --train_path data/spo_0.json --model_name_or_path /mnt/lustre/GPU7/home/liuyuming/code/model/chatGML_6b --per_device_train_batch_size 1 --max_len 1560 --max_src_len 1024 --learning_rate 1e-4 --weight_decay 0.1 --num_train_epochs 2 --gradient_accumulation_steps 4 --warmup_ratio 0.1 --mode glm2 --train_type ptuning --seed 1234 --ds_file ds_zero2_no_offload.json --gradient_checkpointing --show_loss_step 10 --pre_seq_len 16 --prefix_projection True --output_dir ./output-glm2
[2023-11-30 19:35:09,249] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-30 19:35:09,249] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-30 19:35:09,249] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-30 19:35:09,249] [INFO] [launch.py:247:main] dist_world_size=1
[2023-11-30 19:35:09,249] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
2023-11-30 19:35:12.823537: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-30 19:35:12.876890: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-30 19:35:13.977330: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-11-30 19:35:15,553] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
tokenizer.pad_token: <unk>
tokenizer.eos_token: </s>
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.05s/it]
Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /mnt/lustre/GPU7/home/liuyuming/code/model/chatGML_6b and are newly initialized: ['transformer.prefix_encoder.trans.2.weight', 'transformer.prefix_encoder.trans.0.bias', 'transformer.prefix_encoder.trans.2.bias', 'transformer.prefix_encoder.trans.0.weight', 'transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
the number of skipping data is 0
len(train_dataloader) = 1441
len(train_dataset) = 1441
num_training_steps = 722
num_warmup_steps = 72
transformer.prefix_encoder.embedding.weight
transformer.prefix_encoder.trans.0.weight
transformer.prefix_encoder.trans.0.bias
transformer.prefix_encoder.trans.2.weight
transformer.prefix_encoder.trans.2.bias
trainable params: 117688320 || all params: 6361272320 || trainable%: 1.8500751748983448
[2023-11-30 19:35:26,651] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown
[2023-11-30 19:35:26,653] [INFO] [comm.py:580:init_distributed] Distributed backend already initialized
[2023-11-30 19:35:28,806] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /mnt/lustre/GPU7/home/liuyuming/.cache/torch_extensions/py39_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/lustre/GPU7/home/liuyuming/.cache/torch_extensions/py39_cu116/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
FAILED: fused_adam_frontend.o 
c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
/mnt/lustre/GPU7/home/liuyuming/gcc/gcc-9.5.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/9.5.0/cc1plus: error while loading shared libraries: libisl.so.15: cannot open shared object file: No such file or directory
[2/3] /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/bin/nvcc  -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -std=c++14 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
FAILED: multi_tensor_adam.cuda.o 
/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/bin/nvcc  -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -std=c++14 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
/mnt/lustre/GPU7/home/liuyuming/gcc/gcc-9.5.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/9.5.0/cc1plus: error while loading shared libraries: libisl.so.15: cannot open shared object file: No such file or directory
nvcc fatal   : Failed to preprocess host compiler properties.
ninja: build stopped: subcommand failed.

Traceback (most recent call last):
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
    subprocess.run(
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/lustre/GPU7/home/liuyuming/code/ChatGLM-Finetuning-master/train.py", line 242, in <module>
    main()
  File "/mnt/lustre/GPU7/home/liuyuming/code/ChatGLM-Finetuning-master/train.py", line 184, in main
    model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/__init__.py", line 156, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 328, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1176, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1242, in _configure_basic_optimizer
    optimizer = FusedAdam(
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 71, in __init__
    fused_adam_cuda = FusedAdamBuilder().load()
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 449, in load
    return self.jit_load(verbose)
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
    op_module = load(name=self.name,
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'

Here's the virtual environment:

Package                      Version
---------------------------- ----------
absl-py                      2.0.0
accelerate                   0.24.1
astunparse                   1.6.3
Brotli                       1.0.9
cachetools                   5.3.2
certifi                      2023.11.17
cffi                         1.16.0
charset-normalizer           2.0.4
cpm-kernels                  1.0.11
cryptography                 41.0.3
deepspeed                    0.9.0
filelock                     3.13.1
flatbuffers                  23.5.26
fsspec                       2023.10.0
gast                         0.4.0
google-auth                  2.23.4
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
grpcio                       1.59.3
h5py                         3.10.0
hjson                        3.1.0
huggingface-hub              0.19.4
idna                         3.4
importlib-metadata           6.8.0
keras                        2.13.1
libclang                     16.0.6
Markdown                     3.5.1
MarkupSafe                   2.1.3
mkl-fft                      1.3.8
mkl-random                   1.2.4
mkl-service                  2.4.0
ninja                        1.11.1.1
numpy                        1.24.2
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    23.2
peft                         0.3.0
Pillow                       10.0.1
pip                          23.3.1
protobuf                     4.25.1
psutil                       5.9.6
py-cpuinfo                   9.0.0
pyasn1                       0.5.1
pyasn1-modules               0.3.0
pycparser                    2.21
pydantic                     1.10.13
pyOpenSSL                    23.2.0
PySocks                      1.7.1
PyYAML                       6.0.1
regex                        2023.10.3
requests                     2.31.0
requests-oauthlib            1.3.1
rsa                          4.9
sentencepiece                0.1.96
setuptools                   68.0.0
six                          1.16.0
tensorboard                  2.13.0
tensorboard-data-server      0.7.2
tensorflow                   2.13.0
tensorflow-estimator         2.13.0
tensorflow-io-gcs-filesystem 0.34.0
termcolor                    2.3.0
tokenizers                   0.13.3
torch                        1.13.1
torchaudio                   0.13.1
torchvision                  0.14.1
tqdm                         4.64.1
transformers                 4.27.1
typing_extensions            4.5.0
urllib3                      1.26.18
Werkzeug                     3.0.1
wheel                        0.41.2
wrapt                        1.16.0
zipp                         3.17.0

Here is the ds_report:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch']
torch version .................... 1.13.1
deepspeed install path ........... ['/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.0, unknown, unknown
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6
Excelsiorl commented 9 months ago

I'm facing the similar issue.

Here is the ds_report:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0+7d1a95b046), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/users/chintan/venv/lib/python3.8/site-packages/torch']
torch version .................... 2.1.0.dev20230516+cu117
deepspeed install path ........... ['/home/users/chintan/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.2, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 10.0
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

pip freeze:

accelerate==0.19.0
aiohttp==3.8.4
aiosignal==1.3.1
async-timeout==4.0.2
attrs==22.2.0
awscli==1.27.133
bitsandbytes==0.38.1
boto3==1.26.133
botocore==1.29.133
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.3
cloudpickle==2.2.1
cmake==3.25.0
colorama==0.4.4
contextlib2==21.6.0
coverage==7.2.5
dataclasses-json==0.5.7
datasets==2.12.0
deepspeed==0.9.2
dill==0.3.6
docutils==0.16
filelock==3.9.0
frozenlist==1.3.3
fsspec==2023.4.0
google-pasta==0.2.0
greenlet==2.0.2
hjson==3.1.0
huggingface-hub==0.14.1
idna==3.4
importlib-metadata==4.13.0   
importlib-resources==5.12.0  
iniconfig==2.0.0
Jinja2==3.1.2
jmespath==1.0.1
jsonschema==4.17.3
langchain==0.0.165
lit==15.0.7
MarkupSafe==2.1.2
marshmallow==3.19.0
marshmallow-enum==1.5.1
mpmath==1.2.1
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
networkx==3.0rc1
ninja==1.11.1
numexpr==2.8.4
numpy==1.24.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96  
openapi-schema-pydantic==1.2.4
packaging==23.1
pandas==2.0.1
pathos==0.3.0
Pillow==9.3.0
pkgutil_resolve_name==1.3.10 
platformdirs==3.5.1
pluggy==1.0.0
pox==0.3.2
ppft==1.7.6.6
protobuf==3.20.3
protobuf3-to-dict==0.1.5
psutil==5.9.5
py==1.11.0
py-cpuinfo==9.0.0
pyarrow==12.0.0
pyasn1==0.5.0
pydantic==1.10.7
pyrsistent==0.19.3
pytest==7.1.2
pytest-cov==3.0.0
python-dateutil==2.8.2
pytorch-triton==2.1.0+7d1a95b046
pytz==2023.3
PyYAML==5.4.1
regex==2023.5.5
requests==2.28.1
responses==0.18.0
rsa==4.7.2
s3transfer==0.6.1
sagemaker==2.154.0
schema==0.7.5
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
smdebug-rulesconfig==1.0.1
SQLAlchemy==2.0.13
sympy==1.11.1
tblib==1.7.0
tenacity==8.2.2
tensorboardX==2.6
tokenizers==0.13.3
tomli==2.0.1
torch==2.1.0.dev20230516+cu117
torchaudio==2.1.0.dev20230516+cu117
torchvision==0.16.0.dev20230516+cu117
tqdm==4.65.0
transformers @ git+https://github.com/huggingface/transformers.git@d765717c76026281f2fb27ddc44fa3636306bb48
triton==2.0.0
typing-inspect==0.8.0
typing_extensions==4.4.0
tzdata==2023.3
urllib3==1.26.13
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0

NVCC version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

nvidia-smi details:

Wed May 17 15:37:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+

Any help pls?

Did you manage to resolve the issue? :)

loadams commented 9 months ago

@Excelsiorl - your error is this:

gnu/9.5.0/cc1plus: error while loading shared libraries: libisl.so.15: cannot open shared object file: No such file or directory

This looks to be a GCC/build setup error. Can you try reinstalling GCC or resolving that error first if the file does exist on your system?

Excelsiorl commented 9 months ago

@loadams -I installed gcc version 9.5.0 locally on a cluster having OS as CentOS where I dont have root。

PS:I directly installed the precompiled version and modified the corresponding Path and LD_LIBRARY_PATH.

Could this be the reason for the issue?

varun-96 commented 5 months ago

hi @Excelsiorl , I had the similar issue & it was resolved after upgrading the cuda version. You can follow below steps to install latest version of cuda (11.8 worked for me) https://gist.github.com/ksopyla/bf74e8ce2683460d8de6e0dc389fc7f5

Also, for cuDNN, I have used following instructions,

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cudnn-cuda-11

Once your setup is finished & cuda path is updated, you can run nvcc --version to check the updated cuda version. You can start training now 👍