Closed Axe-- closed 1 year ago
Hello,
For the record I am currently having the same issue with CUDA 10.1 / Ubuntu 18.04 / torch 1.7.1 !
I used the trick of changing -v
to --version
in cpp_extension.py
as given here. But fused_adam still can't be found:
Traceback (most recent call last):
File "run_clm_scaling.py", line 400, in <module>
main()
File "run_clm_scaling.py", line 359, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/teven/dev_transformers/transformers/transformers_perso/transformers/src/transformers/trainer.py", line 763, in train
model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
File "/home/teven/dev_transformers/transformers/transformers_perso/transformers/src/transformers/integrations.py", line 405, in init_deepspeed
config_params=config,
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/__init__.py", line 119, in initialize
config_params=config_params)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 171, in __init__
self._configure_optimizer(optimizer, model_parameters)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 514, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 583, in _configure_basic_optimizer
optimizer = FusedAdam(model_parameters, **optimizer_parameters)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
fused_adam_cuda = FusedAdamBuilder().load()
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
return self.jit_load(verbose)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py", line 216, in jit_load
verbose=verbose)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 997, in load
keep_intermediates=keep_intermediates)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1213, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1560, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/usr/lib/python3.6/imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'fused_adam'
@Axe-- and @TevenLeScao , sorry that you are having this issue. Unfortunately, I was unable to repro the problem on my side. I have tried to recreate your environment as best possible, please review further below in case I missed a config. So my suggestion to further debug is to build fused_adam during installation instead of JIT. To do this you will need to clone and build DeepSpeed. Specifically, you want to uninstall and build DeepSpeed with the following two commands.
1. pip uninstall deepspeed -y
2. DS_BUILD_FUSED_ADAM=1 bash install.sh -s
Please let me know how it goes. Thanks!
Below is my environment when I installed in JIT-mode in an attempt to repro the issue.
cat /etc/issue
Ubuntu 18.04.3 LTS \n \l
ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.6/dist-packages/torch']
torch version .................... 1.7.1+cu101
torch cuda version ............... 10.1
nvcc version ..................... 10.1
deepspeed install path ........... ['/usr/local/lib/python3.6/dist-packages/deepspeed']
deepspeed info ................... 0.3.10+5e522ef, 5e522ef, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.1
I had issues with installation and was following the idea in https://github.com/microsoft/DeepSpeed/issues/629#issuecomment-753993124 to change CUDA from 10.1.105 to 10.1.243 and ended up installing 10.2 instead, which fixed this issue.
Sorry, I won't have time to revert to 10.1 to look for the underlying cause, but in any case, that should be an easy fix in the meantime.
@TevenLeScao, no worries about reverting to 10.1. I am glad you are unblocked, which is the most important thing. From your description it seems the underlying issue is a mismatch in the cuda versions of torch and another component, probably deepspeed.
Can you please share the result of ds_report on your working setup? Thanks.
There it is:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 10.2
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/teven/virtualenvs/dev_transformers/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.3.10, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.2
Hey, switching to Cuda 10.2 solves this indeed! Thanks! You may close this issue. :-)
In my case, the same issue happened even after I update cuda to version 10.1.243, and I could not update CUDA 10.2 as my Ubuntu is 14.04 I found that my issue caused by the old version of GCC (4.8). I follow this solution to update GCC 6 and problem solved: https://gist.github.com/application2000/73fd6f4bf1be6600a2cf9f56315a2d91 Hope this help someone ^^
Closing this issue, since it is resolved. Please reopen if needed.
@tjruwase I also running into something similar:
ImportError: No module named 'fused_adam'
Here are additional details:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
[WARNING] sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
[WARNING] async_io requires the libraries: ['libaio-dev'] but are missing.
async_io ............... [NO] ....... [NO]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/jupyter/.local/lib/python3.7/site-packages/torch']
torch version .................... 1.7.1+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/opt/conda/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.3.16, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.6, cuda 10.2
Could you share any pointers to resolve this?
@sayakpaul, your ds_report shows a mismatch in cuda versions. Your deepspeed wheel is built with 10.2, while your cuda installation is 11.0. Can you try building deepspeed from source so that it is compiled with your installed 11.0?
Sure. Let me do that and get back.
@tjruwase here's my ds_report
now:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
[WARNING] sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/jupyter/.local/lib/python3.7/site-packages/torch']
torch version .................... 1.7.1+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/opt/conda/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.4.0+11e94e6, 11e94e6, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0
I see that it says fused_adam
is not installed yet. After cloning the repo, I ran ./install.sh
. DId I miss out on something?
Turned out ninja
wasn't installed properly. I followed this suggestion to install ninja
to allow PyTorch to load the C++ extensions and things should work now.
Met this issue when using gcc 4.8
.
Update the GCC version and reinstall deepspeed by:
pip uninstall deepspeed -y
pip install deepspeed
I'm facing the similar issue.
Here is the ds_report:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
[WARNING] using untested triton version (2.1.0+7d1a95b046), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/users/chintan/venv/lib/python3.8/site-packages/torch']
torch version .................... 2.1.0.dev20230516+cu117
deepspeed install path ........... ['/home/users/chintan/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.2, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 10.0
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
pip freeze:
accelerate==0.19.0
aiohttp==3.8.4
aiosignal==1.3.1
async-timeout==4.0.2
attrs==22.2.0
awscli==1.27.133
bitsandbytes==0.38.1
boto3==1.26.133
botocore==1.29.133
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.3
cloudpickle==2.2.1
cmake==3.25.0
colorama==0.4.4
contextlib2==21.6.0
coverage==7.2.5
dataclasses-json==0.5.7
datasets==2.12.0
deepspeed==0.9.2
dill==0.3.6
docutils==0.16
filelock==3.9.0
frozenlist==1.3.3
fsspec==2023.4.0
google-pasta==0.2.0
greenlet==2.0.2
hjson==3.1.0
huggingface-hub==0.14.1
idna==3.4
importlib-metadata==4.13.0
importlib-resources==5.12.0
iniconfig==2.0.0
Jinja2==3.1.2
jmespath==1.0.1
jsonschema==4.17.3
langchain==0.0.165
lit==15.0.7
MarkupSafe==2.1.2
marshmallow==3.19.0
marshmallow-enum==1.5.1
mpmath==1.2.1
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
networkx==3.0rc1
ninja==1.11.1
numexpr==2.8.4
numpy==1.24.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
openapi-schema-pydantic==1.2.4
packaging==23.1
pandas==2.0.1
pathos==0.3.0
Pillow==9.3.0
pkgutil_resolve_name==1.3.10
platformdirs==3.5.1
pluggy==1.0.0
pox==0.3.2
ppft==1.7.6.6
protobuf==3.20.3
protobuf3-to-dict==0.1.5
psutil==5.9.5
py==1.11.0
py-cpuinfo==9.0.0
pyarrow==12.0.0
pyasn1==0.5.0
pydantic==1.10.7
pyrsistent==0.19.3
pytest==7.1.2
pytest-cov==3.0.0
python-dateutil==2.8.2
pytorch-triton==2.1.0+7d1a95b046
pytz==2023.3
PyYAML==5.4.1
regex==2023.5.5
requests==2.28.1
responses==0.18.0
rsa==4.7.2
s3transfer==0.6.1
sagemaker==2.154.0
schema==0.7.5
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
smdebug-rulesconfig==1.0.1
SQLAlchemy==2.0.13
sympy==1.11.1
tblib==1.7.0
tenacity==8.2.2
tensorboardX==2.6
tokenizers==0.13.3
tomli==2.0.1
torch==2.1.0.dev20230516+cu117
torchaudio==2.1.0.dev20230516+cu117
torchvision==0.16.0.dev20230516+cu117
tqdm==4.65.0
transformers @ git+https://github.com/huggingface/transformers.git@d765717c76026281f2fb27ddc44fa3636306bb48
triton==2.0.0
typing-inspect==0.8.0
typing_extensions==4.4.0
tzdata==2023.3
urllib3==1.26.13
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0
NVCC version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
nvidia-smi details:
Wed May 17 15:37:55 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
Any help pls?
Closing this issue as the original issue is resolved and any new users who encounter issues here should open a new issue and link this one and we would be happy to take a look.
I'm facing the similar issue. Could you assist with the runtime error.
here is the log:
(LLM) [liuyuming@gpu7 ChatGLM-Finetuning-master]$ bash pt2.sh
[2023-11-30 19:34:50,051] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0
[2023-11-30 19:35:05,425] [INFO] [runner.py:540:main] cmd = /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=1024 --enable_each_rank_log=None train.py --train_path data/spo_0.json --model_name_or_path /mnt/lustre/GPU7/home/liuyuming/code/model/chatGML_6b --per_device_train_batch_size 1 --max_len 1560 --max_src_len 1024 --learning_rate 1e-4 --weight_decay 0.1 --num_train_epochs 2 --gradient_accumulation_steps 4 --warmup_ratio 0.1 --mode glm2 --train_type ptuning --seed 1234 --ds_file ds_zero2_no_offload.json --gradient_checkpointing --show_loss_step 10 --pre_seq_len 16 --prefix_projection True --output_dir ./output-glm2
[2023-11-30 19:35:09,249] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-30 19:35:09,249] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-30 19:35:09,249] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-30 19:35:09,249] [INFO] [launch.py:247:main] dist_world_size=1
[2023-11-30 19:35:09,249] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
2023-11-30 19:35:12.823537: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-30 19:35:12.876890: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-30 19:35:13.977330: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-11-30 19:35:15,553] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
tokenizer.pad_token: <unk>
tokenizer.eos_token: </s>
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.05s/it]
Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /mnt/lustre/GPU7/home/liuyuming/code/model/chatGML_6b and are newly initialized: ['transformer.prefix_encoder.trans.2.weight', 'transformer.prefix_encoder.trans.0.bias', 'transformer.prefix_encoder.trans.2.bias', 'transformer.prefix_encoder.trans.0.weight', 'transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
the number of skipping data is 0
len(train_dataloader) = 1441
len(train_dataset) = 1441
num_training_steps = 722
num_warmup_steps = 72
transformer.prefix_encoder.embedding.weight
transformer.prefix_encoder.trans.0.weight
transformer.prefix_encoder.trans.0.bias
transformer.prefix_encoder.trans.2.weight
transformer.prefix_encoder.trans.2.bias
trainable params: 117688320 || all params: 6361272320 || trainable%: 1.8500751748983448
[2023-11-30 19:35:26,651] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown
[2023-11-30 19:35:26,653] [INFO] [comm.py:580:init_distributed] Distributed backend already initialized
[2023-11-30 19:35:28,806] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /mnt/lustre/GPU7/home/liuyuming/.cache/torch_extensions/py39_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/lustre/GPU7/home/liuyuming/.cache/torch_extensions/py39_cu116/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
FAILED: fused_adam_frontend.o
c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
/mnt/lustre/GPU7/home/liuyuming/gcc/gcc-9.5.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/9.5.0/cc1plus: error while loading shared libraries: libisl.so.15: cannot open shared object file: No such file or directory
[2/3] /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -std=c++14 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
FAILED: multi_tensor_adam.cuda.o
/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include -isystem /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -std=c++14 -c /mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
/mnt/lustre/GPU7/home/liuyuming/gcc/gcc-9.5.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/9.5.0/cc1plus: error while loading shared libraries: libisl.so.15: cannot open shared object file: No such file or directory
nvcc fatal : Failed to preprocess host compiler properties.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/lustre/GPU7/home/liuyuming/code/ChatGLM-Finetuning-master/train.py", line 242, in <module>
main()
File "/mnt/lustre/GPU7/home/liuyuming/code/ChatGLM-Finetuning-master/train.py", line 184, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/__init__.py", line 156, in initialize
engine = DeepSpeedEngine(args=args,
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 328, in __init__
self._configure_optimizer(optimizer, model_parameters)
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1176, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1242, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 71, in __init__
fused_adam_cuda = FusedAdamBuilder().load()
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 449, in load
return self.jit_load(verbose)
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'
Here's the virtual environment:
Package Version
---------------------------- ----------
absl-py 2.0.0
accelerate 0.24.1
astunparse 1.6.3
Brotli 1.0.9
cachetools 5.3.2
certifi 2023.11.17
cffi 1.16.0
charset-normalizer 2.0.4
cpm-kernels 1.0.11
cryptography 41.0.3
deepspeed 0.9.0
filelock 3.13.1
flatbuffers 23.5.26
fsspec 2023.10.0
gast 0.4.0
google-auth 2.23.4
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
grpcio 1.59.3
h5py 3.10.0
hjson 3.1.0
huggingface-hub 0.19.4
idna 3.4
importlib-metadata 6.8.0
keras 2.13.1
libclang 16.0.6
Markdown 3.5.1
MarkupSafe 2.1.3
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
ninja 1.11.1.1
numpy 1.24.2
oauthlib 3.2.2
opt-einsum 3.3.0
packaging 23.2
peft 0.3.0
Pillow 10.0.1
pip 23.3.1
protobuf 4.25.1
psutil 5.9.6
py-cpuinfo 9.0.0
pyasn1 0.5.1
pyasn1-modules 0.3.0
pycparser 2.21
pydantic 1.10.13
pyOpenSSL 23.2.0
PySocks 1.7.1
PyYAML 6.0.1
regex 2023.10.3
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
sentencepiece 0.1.96
setuptools 68.0.0
six 1.16.0
tensorboard 2.13.0
tensorboard-data-server 0.7.2
tensorflow 2.13.0
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.34.0
termcolor 2.3.0
tokenizers 0.13.3
torch 1.13.1
torchaudio 0.13.1
torchvision 0.14.1
tqdm 4.64.1
transformers 4.27.1
typing_extensions 4.5.0
urllib3 1.26.18
Werkzeug 3.0.1
wheel 0.41.2
wrapt 1.16.0
zipp 3.17.0
Here is the ds_report:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/torch']
torch version .................... 1.13.1
deepspeed install path ........... ['/mnt/lustre/GPU7/home/liuyuming/anaconda3/envs/LLM/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.0, unknown, unknown
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6
I'm facing the similar issue.
Here is the ds_report:
-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] cpu_adagrad ............ [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0+7d1a95b046), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/home/users/chintan/venv/lib/python3.8/site-packages/torch'] torch version .................... 2.1.0.dev20230516+cu117 deepspeed install path ........... ['/home/users/chintan/venv/lib/python3.8/site-packages/deepspeed'] deepspeed info ................... 0.9.2, unknown, unknown torch cuda version ............... 11.7 torch hip version ................ None nvcc version ..................... 10.0 deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
pip freeze:
accelerate==0.19.0 aiohttp==3.8.4 aiosignal==1.3.1 async-timeout==4.0.2 attrs==22.2.0 awscli==1.27.133 bitsandbytes==0.38.1 boto3==1.26.133 botocore==1.29.133 certifi==2022.12.7 charset-normalizer==2.1.1 click==8.1.3 cloudpickle==2.2.1 cmake==3.25.0 colorama==0.4.4 contextlib2==21.6.0 coverage==7.2.5 dataclasses-json==0.5.7 datasets==2.12.0 deepspeed==0.9.2 dill==0.3.6 docutils==0.16 filelock==3.9.0 frozenlist==1.3.3 fsspec==2023.4.0 google-pasta==0.2.0 greenlet==2.0.2 hjson==3.1.0 huggingface-hub==0.14.1 idna==3.4 importlib-metadata==4.13.0 importlib-resources==5.12.0 iniconfig==2.0.0 Jinja2==3.1.2 jmespath==1.0.1 jsonschema==4.17.3 langchain==0.0.165 lit==15.0.7 MarkupSafe==2.1.2 marshmallow==3.19.0 marshmallow-enum==1.5.1 mpmath==1.2.1 multidict==6.0.4 multiprocess==0.70.14 mypy-extensions==1.0.0 networkx==3.0rc1 ninja==1.11.1 numexpr==2.8.4 numpy==1.24.1 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 openapi-schema-pydantic==1.2.4 packaging==23.1 pandas==2.0.1 pathos==0.3.0 Pillow==9.3.0 pkgutil_resolve_name==1.3.10 platformdirs==3.5.1 pluggy==1.0.0 pox==0.3.2 ppft==1.7.6.6 protobuf==3.20.3 protobuf3-to-dict==0.1.5 psutil==5.9.5 py==1.11.0 py-cpuinfo==9.0.0 pyarrow==12.0.0 pyasn1==0.5.0 pydantic==1.10.7 pyrsistent==0.19.3 pytest==7.1.2 pytest-cov==3.0.0 python-dateutil==2.8.2 pytorch-triton==2.1.0+7d1a95b046 pytz==2023.3 PyYAML==5.4.1 regex==2023.5.5 requests==2.28.1 responses==0.18.0 rsa==4.7.2 s3transfer==0.6.1 sagemaker==2.154.0 schema==0.7.5 scipy==1.10.1 sentencepiece==0.1.99 six==1.16.0 smdebug-rulesconfig==1.0.1 SQLAlchemy==2.0.13 sympy==1.11.1 tblib==1.7.0 tenacity==8.2.2 tensorboardX==2.6 tokenizers==0.13.3 tomli==2.0.1 torch==2.1.0.dev20230516+cu117 torchaudio==2.1.0.dev20230516+cu117 torchvision==0.16.0.dev20230516+cu117 tqdm==4.65.0 transformers @ git+https://github.com/huggingface/transformers.git@d765717c76026281f2fb27ddc44fa3636306bb48 triton==2.0.0 typing-inspect==0.8.0 typing_extensions==4.4.0 tzdata==2023.3 urllib3==1.26.13 xxhash==3.2.0 yarl==1.9.2 zipp==3.15.0
NVCC version:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
nvidia-smi details:
Wed May 17 15:37:55 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+
Any help pls?
Did you manage to resolve the issue? :)
@Excelsiorl - your error is this:
gnu/9.5.0/cc1plus: error while loading shared libraries: libisl.so.15: cannot open shared object file: No such file or directory
This looks to be a GCC/build setup error. Can you try reinstalling GCC or resolving that error first if the file does exist on your system?
@loadams -I installed gcc version 9.5.0 locally on a cluster having OS as CentOS where I dont have root。
PS:I directly installed the precompiled version and modified the corresponding Path and LD_LIBRARY_PATH.
Could this be the reason for the issue?
hi @Excelsiorl , I had the similar issue & it was resolved after upgrading the cuda version. You can follow below steps to install latest version of cuda (11.8 worked for me) https://gist.github.com/ksopyla/bf74e8ce2683460d8de6e0dc389fc7f5
Also, for cuDNN, I have used following instructions,
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cudnn-cuda-11
Once your setup is finished & cuda path is updated, you can run nvcc --version
to check the updated cuda version.
You can start training now 👍
Hey, I was trying out the cifar-10 tutorial (link).
Could you assist with the runtime error.
On executing (run_ds.sh):
Here's _dsreport:
Running with CUDA 10.1 on Ubuntu 18/04. Here's the virtual environment: