microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.48k stars 4.12k forks source link

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #1846

Closed maxmaier59 closed 2 years ago

maxmaier59 commented 2 years ago

When I try to do finetuning with Deepspeed I get the following error message:

Traceback (most recent call last): File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 97, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

I have built Deepspeed with

git clone https://github.com/microsoft/DeepSpeed cd DeepSpeed DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check

It seems that ds_opt_adam was not built

This is the output I've got:

/media/max/Volume/GPT/finetune/DeepSpeed Using pip 21.2.4 from /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/pip (python 3.8) Obtaining file:///media/max/Volume/GPT/finetune/DeepSpeed /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/pip/_internal/commands/install.py:229: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option. cmdoptions.check_install_build_global(options) Running command python setup.py egg_info DS_BUILD_OPS=0 Installed CUDA version 11.4 does not match the version torch was compiled with 11.5 but since the APIs are compatible, accepting this combination Install Ops={'cpu_adam': 1, 'cpu_adagrad': False, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'async_io': 1, 'utils': 1, 'quantizer': False, 'transformer_inference': False} version=0.6.0+a32e9b33, git_hash=a32e9b33, git_branch=HEAD install_requires=['hjson', 'ninja', 'numpy', 'packaging', 'psutil', 'py-cpuinfo', 'torch', 'tqdm', 'triton==1.0.0'] compatible_ops={'cpu_adam': True, 'cpu_adagrad': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': True, 'transformer': True, 'stochastic_transformer': True, 'async_io': True, 'utils': True, 'quantizer': True, 'transformer_inference': True} ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7f2b7bd0e820>, <setuptools.extension.Extension('deepspeed.ops.aio.async_io_op') at 0x7f2b7bbdd790>, <setuptools.extension.Extension('deepspeed.ops.utils_op') at 0x7f2b7bb5ff70>] running egg_info creating /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info writing /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/PKG-INFO writing dependency_links to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/dependency_links.txt writing entry points to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/entry_points.txt writing requirements to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/requires.txt writing top-level names to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/top_level.txt writing manifest file '/tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '.hip' under directory 'deepspeed' warning: no files found matching '.cc' under directory 'deepspeed' warning: no files found matching '.tr' under directory 'csrc' warning: no files found matching '.cc' under directory 'csrc' adding license file 'LICENSE' writing manifest file '/tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/SOURCES.txt' deepspeed build time = 0.36443185806274414 secs Requirement already satisfied: hjson in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (3.0.2) Requirement already satisfied: ninja in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.10.2.3) Requirement already satisfied: numpy in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.22.3) Requirement already satisfied: packaging in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (21.3) Requirement already satisfied: psutil in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (5.9.0) Requirement already satisfied: py-cpuinfo in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (8.0.0) Requirement already satisfied: torch in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.11.0+cu115) Requirement already satisfied: tqdm in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (4.63.0) Requirement already satisfied: triton==1.0.0 in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.0.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from packaging->deepspeed==0.6.0+a32e9b33) (3.0.4) Requirement already satisfied: typing-extensions in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from torch->deepspeed==0.6.0+a32e9b33) (3.10.0.2) Installing collected packages: deepspeed Attempting uninstall: deepspeed Found existing installation: deepspeed 0.5.9+d0ab7224 Uninstalling deepspeed-0.5.9+d0ab7224: Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/deepspeed Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/deepspeed.pt Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds_elastic Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds_report Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds_ssh Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed-0.5.9+d0ab7224-py3.8.egg-info Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed/ Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/op_builder/ Successfully uninstalled deepspeed-0.5.9+d0ab7224 Running setup.py develop for deepspeed Running command /home/max/anaconda3/envs/gptneo_finetuned/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/media/max/Volume/GPT/finetune/DeepSpeed/setup.py'"'"'; file='"'"'/media/max/Volume/GPT/finetune/DeepSpeed/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' build_ext -j8 develop --no-deps DS_BUILD_OPS=0 Installed CUDA version 11.4 does not match the version torch was compiled with 11.5 but since the APIs are compatible, accepting this combination Install Ops={'cpu_adam': 1, 'cpu_adagrad': False, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'async_io': 1, 'utils': 1, 'quantizer': False, 'transformer_inference': False} version=0.6.0+a32e9b33, git_hash=a32e9b33, git_branch=HEAD install_requires=['hjson', 'ninja', 'numpy', 'packaging', 'psutil', 'py-cpuinfo', 'torch', 'tqdm', 'triton==1.0.0'] compatible_ops={'cpu_adam': True, 'cpu_adagrad': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': True, 'transformer': True, 'stochastic_transformer': True, 'async_io': True, 'utils': True, 'quantizer': True, 'transformer_inference': True} ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7f41e6e48f10>, <setuptools.extension.Extension('deepspeed.ops.aio.async_io_op') at 0x7f41e6214790>, <setuptools.extension.Extension('deepspeed.ops.utils_op') at 0x7f41e6193f40>] running build_ext building 'deepspeed.ops.adam.cpu_adam_op' extension building 'deepspeed.ops.aio.async_io_op' extension creating build creating build/temp.linux-x86_64-3.8 building 'deepspeed.ops.utils_op' extension creating build/temp.linux-x86_64-3.8/csrc creating build/temp.linux-x86_64-3.8/csrc creating build/temp.linux-x86_64-3.8/csrc/adam creating build/temp.linux-x86_64-3.8/csrc/utils creating build/temp.linux-x86_64-3.8/csrc/aio gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/utils/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.8/csrc/utils/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=utils_op -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 creating build/temp.linux-x86_64-3.8/csrc/aio/py_lib creating build/temp.linux-x86_64-3.8/csrc/common gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/includes -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o -O3 -std=c++14 -g -Wno-reorder -L/home/max/anaconda3/envs/gptneo_finetuned/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 creating build/temp.linux-x86_64-3.8/csrc/aio/common gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_py_copy.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_copy.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++

In file included from csrc/includes/cpu_adam.h:12, from csrc/adam/cpu_adam.cpp:1: csrc/includes/simd.h:63: warning: ignoring #pragma unroll [-Wunknown-pragmas] 63 #pragma unroll
csrc/includes/simd.h:71: warning: ignoring #pragma unroll [-Wunknown-pragmas] 71 #pragma unroll
csrc/includes/simd.h:79: warning: ignoring #pragma unroll [-Wunknown-pragmas] 79 #pragma unroll
csrc/includes/simd.h:87: warning: ignoring #pragma unroll [-Wunknown-pragmas] 87 #pragma unroll
csrc/includes/simd.h:95: warning: ignoring #pragma unroll [-Wunknown-pragmas] 95 #pragma unroll
csrc/includes/simd.h:103: warning: ignoring #pragma unroll [-Wunknown-pragmas] 103 #pragma unroll
csrc/includes/simd.h:109: warning: ignoring #pragma unroll [-Wunknown-pragmas] 109 #pragma unroll
csrc/includes/simd.h:115: warning: ignoring #pragma unroll [-Wunknown-pragmas] 115 #pragma unroll
csrc/includes/simd.h:121: warning: ignoring #pragma unroll [-Wunknown-pragmas] 121 #pragma unroll
csrc/includes/simd.h:127: warning: ignoring #pragma unroll [-Wunknown-pragmas] 127 #pragma unroll
csrc/includes/simd.h:133: warning: ignoring #pragma unroll [-Wunknown-pragmas] 133 #pragma unroll

gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/py_ds_aio.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/py_ds_aio.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/deepspeed creating build/lib.linux-x86_64-3.8/deepspeed/ops g++ -pthread -shared -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath=/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.8/csrc/utils/flatten_unflatten.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so csrc/adam/cpu_adam.cpp: In member function ‘void Adam_Optimizer::Step_1(float, float, float, float, size_t, half, bool)’: csrc/adam/cpu_adam.cpp:45:17: warning: ‘params_cast_h’ may be used uninitialized in this function [-Wmaybe-uninitialized] 45 | __half params_cast_h; | ^~~~~ csrc/adam/cpu_adam.cpp:44:17: warning: ‘grads_cast_h’ may be used uninitialized in this function [-Wmaybe-uninitialized] 44 | half* grads_cast_h; | ^~~~ /home/max/anaconda3/envs/gptneo_finetuned/bin/nvcc -Icsrc/includes -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/common/custom_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/common/custom_cuda_kernel.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -gencode=arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 creating build/lib.linux-x86_64-3.8/deepspeed/ops/adam g++ -pthread -shared -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath=/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o build/temp.linux-x86_64-3.8/csrc/common/custom_cuda_kernel.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_py_aio.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_py_aio_handle.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio_handle.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_aio_thread.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_aio_thread.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++

gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/common/deepspeed_aio_utils.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_utils.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/common/deepspeed_aio_common.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_common.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ csrc/aio/common/deepspeed_aio_common.cpp: In function ‘void _do_io_submit_singles(long long int, long long int, std::unique_ptr&, std::vector<std::chrono::duration >&)’: csrc/aio/common/deepspeed_aio_common.cpp:76:20: warning: unused variable ‘submit_ret’ [-Wunused-variable] 76 | const auto submit_ret = io_submit(aio_ctxt->_io_ctxt, 1, aio_ctxt->_iocbs.data() + i); | ^~~~~~ csrc/aio/common/deepspeed_aio_common.cpp: In function ‘void _do_io_submit_block(long long int, long long int, std::unique_ptr&, std::vector<std::chrono::duration >&)’: csrc/aio/common/deepspeed_aio_common.cpp:96:16: warning: unused variable ‘submit_ret’ [-Wunused-variable] 96 | const auto submit_ret = io_submit(aio_ctxt->_io_ctxt, n_iocbs, aio_ctxt->_iocbs.data()); | ^~~~~~ csrc/aio/common/deepspeed_aio_common.cpp: In function ‘int regular_read(const char, std::vector&)’: csrc/aio/common/deepspeed_aio_common.cpp:280:16: warning: unused variable ‘f_size’ [-Wunused-variable] 280 | const auto f_size = get_file_size(filename, num_bytes); | ^~ csrc/aio/common/deepspeed_aio_common.cpp: In function ‘bool _validate_buffer(const char, void, long long int)’: csrc/aio/common/deepspeed_aio_common.cpp:307:16: warning: unused variable ‘reg_ret’ [-Wunused-variable] 307 | const auto reg_ret = regular_read(filename, regular_buffer); | ^~~ gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/common/deepspeed_aio_types.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_types.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -DAVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ creating build/lib.linux-x86_64-3.8/deepspeed/ops/aio g++ -pthread -shared -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath=/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_copy.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/py_ds_aio.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio_handle.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_aio_thread.o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_utils.o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_common.o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_types.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/deepspeed/ops/aio/async_io_op.cpython-38-x86_64-linux-gnu.so -laio running develop running egg_info creating deepspeed.egg-info writing deepspeed.egg-info/PKG-INFO writing dependency_links to deepspeed.egg-info/dependency_links.txt writing entry points to deepspeed.egg-info/entry_points.txt writing requirements to deepspeed.egg-info/requires.txt writing top-level names to deepspeed.egg-info/top_level.txt writing manifest file 'deepspeed.egg-info/SOURCES.txt' reading manifest file 'deepspeed.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/utils/cpp_extension.py:788: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.5). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) warning: no files found matching '.hip' under directory 'deepspeed' warning: no files found matching '.cc' under directory 'deepspeed' warning: no files found matching '.tr' under directory 'csrc' warning: no files found matching '*.cc' under directory 'csrc' adding license file 'LICENSE' writing manifest file 'deepspeed.egg-info/SOURCES.txt' running build_ext copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/aio/async_io_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops/aio copying build/lib.linux-x86_64-3.8/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops Creating /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed.egg-link (link to .) Adding deepspeed 0.6.0+a32e9b33 to easy-install.pth file Installing deepspeed script to /home/max/anaconda3/envs/gptneo_finetuned/bin Installing deepspeed.pt script to /home/max/anaconda3/envs/gptneo_finetuned/bin Installing ds script to /home/max/anaconda3/envs/gptneo_finetuned/bin Installing ds_ssh script to /home/max/anaconda3/envs/gptneo_finetuned/bin Installing ds_report script to /home/max/anaconda3/envs/gptneo_finetuned/bin Installing ds_elastic script to /home/max/anaconda3/envs/gptneo_finetuned/bin

Installed /media/max/Volume/GPT/finetune/DeepSpeed /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/utils/cpp_extension.py:788: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.5). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) deepspeed build time = 90.15858387947083 secs

jeffra commented 2 years ago

Can you share the output of ds_report after your install?

Also I recently discovered a potential issue with this pre compile style (see https://github.com/microsoft/DeepSpeed/issues/1840). Can you see if you get the same error installing this way:

DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .

maxmaier59 commented 2 years ago

Here is the output of ds_report


DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch'] torch version .................... 1.11.0+cu115 torch cuda version ............... 11.5 torch hip version ................ None nvcc version ..................... 11.4 deepspeed install path ........... ['/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+a32e9b33, a32e9b33, HEAD deepspeed wheel compiled w. ...... torch 1.11, cuda 11.5, hip 0.0

maxmaier59 commented 2 years ago

Here is the command I've used for installation:

TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_OP_ADAM=1 DS_BUILD_UTILS=1 DS_BUILD_AIO=1 pip install -e. \

--global-option="build_ext" --global-option="-j8" --no-cache -v \

--disable-pip-version-check 2>&1 | tee build.log

using DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .

makes no difference

maxmaier59 commented 2 years ago

I think the root cause of the problem is this:

ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

jeffra commented 2 years ago

Ohh I see your comments on this issue now as well (https://github.com/pytorch/pytorch/issues/69666). If you try a recent torch nightly build does it still exhibit the issue?

maxmaier59 commented 2 years ago

Hmm, I've tried with the torch nightly build but I am getting still the same error message


DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

cpu_adam ............... [YES] ...... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch'] torch version .................... 1.12.0.dev20220320+cu115 torch cuda version ............... 11.5 torch hip version ................ None nvcc version ..................... 11.4 deepspeed install path ........... ['/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed'] deepspeed info ................... 0.6.0+a32e9b33, a32e9b33, HEAD deepspeed wheel compiled w. ...... torch 1.12, cuda 11.5, hip 0.0

Please let me know if you need any additional information

maxmaier59 commented 2 years ago

I wonder if there is any hope to get this fixed or a work around. Is it just me having this problem or other people as well who want to use DeepSpeed with the Adam optimizer ?

tjruwase commented 2 years ago

@maxmaier59, can clarify whether your intention is to use DeepSpeed CPUAdam or torch Adam optimizer?

maxmaier59 commented 2 years ago

I am not sure what the difference ist. Without a better understanding I would like to use the DeepSpeed CPUAdam otpimizer

tjruwase commented 2 years ago

@maxmaier59, CPUAdam was created for executing optimizer computations on CPU instead of GPU. Please see this tutorial for more details.

maxmaier59 commented 2 years ago

In this case I need CPUAdam

maxmaier59 commented 2 years ago

Please can somebody help me to solve this problem?

I wonder what is going on. It seems to me that either CPUAdam optimizer for Deepspeed has been abandoned or I am doing something wrong. If the latter is the case, can somebody please help me to find my error to fix the problem?

If the first is the case I wonder why the optimizer had been dropped. Is there any alternative?

tjruwase commented 2 years ago

@maxmaier59, apologies for the delayed response. CPUAdam is still very much an important part of DeepSpeed as our offloading technologies depend on it. I am a bit confused about whether the original issue was observed during build or during an actual run. The issue mentions an attribute error which suggests this occurred during a run, so in that case can you please repaste or point me to the stack trace? Sorry for asking you to provide this again.

maxmaier59 commented 2 years ago

Many thanks for getting back to me! The error occurs during an actual run. Here is the command to start deepspeed:

deepspeed --num_gpus=2 run_clm.py \ --deepspeed ds_config.json \ --model_name_or_path EleutherAI/gpt-neo-2.7B \ --train_file train.csv \ --validation_file validation.csv \ --do_train \ --do_eval \ --fp16 \ --overwrite_cache \ --evaluation_strategy="steps" \ --output_dir finetuned \ --num_train_epochs 1 \ --eval_steps 15 \ --gradient_accumulation_steps 2 \ --per_device_train_batch_size 4 \ --use_fast_tokenizer False \ --learning_rate 5e-06 \ --warmup_steps 10

And here is the output:

[2022-03-24 22:33:28,352] [WARNING] [runner.py:155:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-03-24 22:33:28,382] [INFO] [runner.py:438:main] cmd = /home/max/anaconda3/envs/gptneo_finetuned/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 run_clm.py --deepspeed ds_config.json --model_name_or_path EleutherAI/gpt-neo-2.7B --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --evaluation_strategy=steps --output_dir finetuned --num_train_epochs 1 --eval_steps 15 --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --use_fast_tokenizer False --learning_rate 5e-06 --warmup_steps 10 [2022-03-24 22:33:29,110] [INFO] [launch.py:103:main] WORLD INFO DICT: {'localhost': [0, 1]} [2022-03-24 22:33:29,110] [INFO] [launch.py:109:main] nnodes=1, num_local_procs=2, node_rank=0 [2022-03-24 22:33:29,111] [INFO] [launch.py:122:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]}) [2022-03-24 22:33:29,111] [INFO] [launch.py:123:main] dist_world_size=2 [2022-03-24 22:33:29,111] [INFO] [launch.py:125:main] Setting CUDA_VISIBLE_DEVICES=0,1 [2022-03-24 22:33:30,474] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl 03/24/2022 22:33:30 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 03/24/2022 22:33:30 - INFO - main - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_steps=15, evaluation_strategy=IntervalStrategy.STEPS, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-06, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=finetuned/runs/Mar24_22-33-30_max-Desktop, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_HF, output_dir=finetuned, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=finetuned, save_on_each_node=False, save_steps=500, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=10, weight_decay=0.0, xpu_backend=None, ) 03/24/2022 22:33:30 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True 03/24/2022 22:33:31 - WARNING - datasets.builder - Using custom data configuration default-e1878cb86e47ddff 03/24/2022 22:33:31 - WARNING - datasets.builder - Using custom data configuration default-e1878cb86e47ddff 03/24/2022 22:33:31 - WARNING - datasets.builder - Reusing dataset csv (/home/max/.cache/huggingface/datasets/csv/default-e1878cb86e47ddff/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519) 100%|███████████████████████████████████████████| 2/2 [00:00<00:00, 1381.75it/s] 03/24/2022 22:33:31 - WARNING - datasets.builder - Reusing dataset csv (/home/max/.cache/huggingface/datasets/csv/default-e1878cb86e47ddff/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519) 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 361.31it/s] [INFO|configuration_utils.py:648] 2022-03-24 22:33:31,586 >> loading configuration file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/config.json from cache at /home/max/.cache/huggingface/transformers/3c80ef2946e1aacc6dd37cb986ea989c29c92775701655bedf14d8791825a30b.f1ede5af01beb85af6cba189a5671dbac3fe256282f737ff0fedf1db882ca729 [INFO|configuration_utils.py:684] 2022-03-24 22:33:31,589 >> Model config GPTNeoConfig { "_name_or_path": "EleutherAI/gpt-neo-2.7B", "activation_function": "gelu_new", "architectures": [ "GPTNeoForCausalLM" ], "attention_dropout": 0, "attention_layers": [ "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local" ], "attention_types": [ [ [ "global", "local" ], 16 ] ], "bos_token_id": 50256, "embed_dropout": 0, "eos_token_id": 50256, "gradient_checkpointing": false, "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": null, "layer_norm_epsilon": 1e-05, "max_position_embeddings": 2048, "model_type": "gpt_neo", "num_heads": 20, "num_layers": 32, "resid_dropout": 0, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50, "temperature": 0.9 } }, "tokenizer_class": "GPT2Tokenizer", "transformers_version": "4.17.0", "use_cache": true, "vocab_size": 50257, "window_size": 256 }

[INFO|configuration_utils.py:648] 2022-03-24 22:33:32,544 >> loading configuration file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/config.json from cache at /home/max/.cache/huggingface/transformers/3c80ef2946e1aacc6dd37cb986ea989c29c92775701655bedf14d8791825a30b.f1ede5af01beb85af6cba189a5671dbac3fe256282f737ff0fedf1db882ca729 [INFO|configuration_utils.py:684] 2022-03-24 22:33:32,546 >> Model config GPTNeoConfig { "_name_or_path": "EleutherAI/gpt-neo-2.7B", "activation_function": "gelu_new", "architectures": [ "GPTNeoForCausalLM" ], "attention_dropout": 0, "attention_layers": [ "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local" ], "attention_types": [ [ [ "global", "local" ], 16 ] ], "bos_token_id": 50256, "embed_dropout": 0, "eos_token_id": 50256, "gradient_checkpointing": false, "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": null, "layer_norm_epsilon": 1e-05, "max_position_embeddings": 2048, "model_type": "gpt_neo", "num_heads": 20, "num_layers": 32, "resid_dropout": 0, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50, "temperature": 0.9 } }, "tokenizer_class": "GPT2Tokenizer", "transformers_version": "4.17.0", "use_cache": true, "vocab_size": 50257, "window_size": 256 }

[INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/vocab.json from cache at /home/max/.cache/huggingface/transformers/d4455fdc7c8e2bcf94a0bfe134b748a93c37ecadb7b8f6b0eb508ffdd433a61e.a1b97b074a5ac71fad0544c8abc1b3581803d73832476184bde6cff06a67b6bb [INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/merges.txt from cache at /home/max/.cache/huggingface/transformers/5660be25091706bde0cfb60f17ae72c7a2aa40223d68954d4d8ffd1fc6995643.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435 [INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/special_tokens_map.json from cache at /home/max/.cache/huggingface/transformers/953b5ce47652cf8b6e945b3570bfa7621164c337e05419b954dbe0a4d16a7480.3ae9ae72462581d20e36bc528e9c47bb30cd671bb21add40ca0b24a0be9fac22 [INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/tokenizer_config.json from cache at /home/max/.cache/huggingface/transformers/57ccc3b8af045ea106fffa36bcc8b764e9702b5f4c1f7b3aad70ccfcaa931221.c31b6b7d3225be0c43bc0f8e5d84d03a8b49fdb6b9f6009bbfff1f9cc5ec18bc [INFO|configuration_utils.py:648] 2022-03-24 22:33:35,408 >> loading configuration file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/config.json from cache at /home/max/.cache/huggingface/transformers/3c80ef2946e1aacc6dd37cb986ea989c29c92775701655bedf14d8791825a30b.f1ede5af01beb85af6cba189a5671dbac3fe256282f737ff0fedf1db882ca729 [INFO|configuration_utils.py:684] 2022-03-24 22:33:35,409 >> Model config GPTNeoConfig { "_name_or_path": "EleutherAI/gpt-neo-2.7B", "activation_function": "gelu_new", "architectures": [ "GPTNeoForCausalLM" ], "attention_dropout": 0, "attention_layers": [ "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local" ], "attention_types": [ [ [ "global", "local" ], 16 ] ], "bos_token_id": 50256, "embed_dropout": 0, "eos_token_id": 50256, "gradient_checkpointing": false, "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": null, "layer_norm_epsilon": 1e-05, "max_position_embeddings": 2048, "model_type": "gpt_neo", "num_heads": 20, "num_layers": 32, "resid_dropout": 0, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50, "temperature": 0.9 } }, "tokenizer_class": "GPT2Tokenizer", "transformers_version": "4.17.0", "use_cache": true, "vocab_size": 50257, "window_size": 256 }

[INFO|modeling_utils.py:1431] 2022-03-24 22:33:36,020 >> loading weights file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/pytorch_model.bin from cache at /home/max/.cache/huggingface/transformers/0839a11efa893f2a554f8f540f904b0db0e5320a2b1612eb02c3fd25471c189a.a144c17634fa6a7823e398888396dd623e204dce9e33c3175afabfbf24bd8f56 [INFO|modeling_utils.py:1485] 2022-03-24 22:33:40,536 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model [2022-03-24 22:33:59,526] [INFO] [partition_parameters.py:456:exit] finished initializing model with 2.78B parameters /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/nn/modules/module.py:1383: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/nn/modules/module.py:1383: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( [INFO|modeling_utils.py:1702] 2022-03-24 22:34:17,407 >> All model checkpoint weights were used when initializing GPTNeoForCausalLM.

[INFO|modeling_utils.py:1710] 2022-03-24 22:34:17,407 >> All the weights of GPTNeoForCausalLM were initialized from the model checkpoint at EleutherAI/gpt-neo-2.7B. If your task is similar to the task the model of the checkpoint was trained on, you can already use GPTNeoForCausalLM for predictions without further training. 0%| | 0/1 [00:00<?, ?ba/s][WARNING|tokenization_utils_base.py:3397] 2022-03-24 22:34:23,519 >> Token indices sequence length is longer than the specified maximum sequence length for this model (1462828 > 2048). Running this sequence through the model will result in indexing errors 100%|█████████████████████████████████████████████| 1/1 [00:05<00:00, 5.40s/ba] Token indices sequence length is longer than the specified maximum sequence length for this model (1462828 > 2048). Running this sequence through the model will result in indexing errors 100%|█████████████████████████████████████████████| 1/1 [00:05<00:00, 5.44s/ba] 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 51.73ba/s] run_clm.py:360: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( 03/24/2022 22:34:24 - WARNING - main - The tokenizer picked seems to have a very large model_max_length (2048). Picking 1024 instead. You can change that default value by passing --block_size xxx. 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 47.52ba/s] run_clm.py:360: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( 03/24/2022 22:34:24 - WARNING - main - The tokenizer picked seems to have a very large model_max_length (2048). Picking 1024 instead. You can change that default value by passing --block_size xxx. 100%|█████████████████████████████████████████████| 1/1 [00:01<00:00, 1.14s/ba] 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 147.62ba/s] [INFO|trainer.py:457] 2022-03-24 22:34:25,574 >> Using amp half precision backend [2022-03-24 22:34:25,578] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+a32e9b33, git-hash=a32e9b33, git-branch=HEAD 100%|█████████████████████████████████████████████| 1/1 [00:01<00:00, 1.11s/ba] 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 182.52ba/s] [2022-03-24 22:34:25,842] [INFO] [engine.py:277:init] DeepSpeed Flops Profiler Enabled: False Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 441, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/trainer.py", line 1240, in train deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/deepspeed.py", line 424, in deepspeed_init deepspeedengine, optimizer, , lr_scheduler = deepspeed.initialize(kwargs) File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/init.py", line 119, in initialize Traceback (most recent call last): File "run_clm.py", line 478, in engine = DeepSpeedEngine(args=args, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 293, in init main() File "run_clm.py", line 441, in main self._configure_optimizer(optimizer, model_parameters) File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1062, in _configure_optimizer train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/trainer.py", line 1240, in train basic_optimizer = self._configure_basic_optimizer(model_parameters) File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/deepspeed.py", line 424, in deepspeed_init deepspeedengine, optimizer, , lr_scheduler = deepspeed.initialize(kwargs) File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/init.py", line 119, in initialize optimizer = DeepSpeedCPUAdam(model_parameters, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init engine = DeepSpeedEngine(args=args, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 293, in init self.ds_opt_adam = CPUAdamBuilder().load() self._configure_optimizer(optimizer, model_parameters) File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 455, in load File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1062, in _configure_optimizer return importlib.import_module(self.absolute_name()) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module basic_optimizer = self._configure_basic_optimizer(model_parameters) File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load optimizer = DeepSpeedCPUAdam(model_parameters, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init File "", line 975, in _find_and_load_unlocked self.ds_opt_adam = CPUAdamBuilder().load() File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 455, in load File "", line 657, in _load_unlocked File "", line 556, in module_from_spec return importlib.import_module(self.absolute_name()) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 1166, in create_module File "", line 991, in _find_and_load File "", line 219, in _call_with_frames_removed File "", line 975, in _find_and_load_unlocked ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator File "", line 657, in _load_unlocked File "", line 556, in module_from_spec File "", line 1166, in create_module File "", line 219, in _call_with_frames_removed ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fa491b11a60> Traceback (most recent call last): File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 97, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fe75daa6a60> Traceback (most recent call last): File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 97, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' [2022-03-24 22:34:27,182] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 69193 [2022-03-24 22:34:27,183] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 69194 [2022-03-24 22:34:27,183] [ERROR] [launch.py:184:sigkill_handler] ['/home/max/anaconda3/envs/gptneo_finetuned/bin/python', '-u', 'run_clm.py', '--local_rank=1', '--deepspeed', 'ds_config.json', '--model_name_or_path', 'EleutherAI/gpt-neo-2.7B', '--train_file', 'train.csv', '--validation_file', 'validation.csv', '--do_train', '--do_eval', '--fp16', '--overwrite_cache', '--evaluation_strategy=steps', '--output_dir', 'finetuned', '--num_train_epochs', '1', '--eval_steps', '15', '--gradient_accumulation_steps', '2', '--per_device_train_batch_size', '4', '--use_fast_tokenizer', 'False', '--learning_rate', '5e-06', '--warmup_steps', '10'] exits with return code = 1

tjruwase commented 2 years ago

--deepspeed ds_config.json

Thanks! Can you please share the contents of ds_config.json?

maxmaier59 commented 2 years ago

Here is the ds_config.json

{ "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto" } }, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "nvme_path": "nvme_data", "pin_memory": false, "buffer_count": 4, "fast_init": false }, "offload_param": { "device": "cpu", "nvme_path": "nvme_param", "pin_memory": false, "buffer_count": 5, "buffer_size": 1e8, "max_in_cpu": 1e10 }, "aio": { "block_size": 262144, "queue_depth": 32, "thread_count": 1, "single_submit": false, "overlap_events": true }, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e9, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e9, "stage3_max_reuse_distance": 1e9, "stage3_gather_fp16_weights_on_model_save": true },

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false

}

maxmaier59 commented 2 years ago

BTW, there is a simpler way to reproduce the problem: Please see the DeepSpeed tutorial for installation: https://www.deepspeed.ai/tutorials/advanced-install/

DS_BUILD_OPS=1 pip install deepspeed

And then run this: python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'

Error message:

Traceback (most recent call last): File "", line 1, in File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 455, in load return importlib.import_module(self.absolute_name()) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 657, in _load_unlocked File "", line 556, in module_from_spec File "", line 1166, in create_module File "", line 219, in _call_with_frames_removed ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

Can you please let me know what is the official way to build DeepSpeed to be able to run the cpu_adam optimizer?

To me this seems fundamentally broken. Or maybe I am fundamentally misunderstanding how this is supposed to work

jeffra commented 2 years ago

Hi @maxmaier59, so sorry you’re running into this issue. One thing I don’t recall if I’ve asked. Can you use JIT to compile cpu Adam successfully? You can try this by installing deepspeed w/o any DS_* variables or via: “DS_BUILD_OPS=0 pip install deepspeed”.

After install you can force a build of cpu Adam in a Python shell via:

import deepspeed deepspeed.ops.op_builder.CPUAdamBuilder().load()

You’ll need ninja installed for this to work, many setups already have this though. More info here: https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages

maxmaier59 commented 2 years ago

Hello Jeff, many thanks for your suggestions. This fixed my problem! With that I was able to get the cpu Adam optimizer compiled and the finetuning started! :-) Many, many thanks to you and also to Olatunji This was exactly what I was looking for.

jeffra commented 2 years ago

Excellent, really glad to hear. It still concerns me that the pre-compilation method doesn't work for you but I am glad you are unblocked for now at least. I'll close this issue for now, feel free to re-open if you have further issues along this line.

sayakpaul commented 2 years ago

I am also facing a similar problem and I have detailed about it here: https://discuss.huggingface.co/t/run-translation-py-example-is-erroring-out-with-the-recommended-settings/16432

sayakpaul commented 2 years ago

https://github.com/microsoft/DeepSpeed/issues/1846#issuecomment-1080226911 solved my problem too but I think it's a matter of concern still.

stas00 commented 2 years ago

@jeffra, if you remember these 2 interconnected threads:

I am pretty sure that's the cause of the problem for pre-building.

If you remember torch conda build worked but pip was failing.

@maxmaier59, please check if the problem goes away if you installed torch via conda.

sayakpaul commented 2 years ago

Is there a recent workaround I could refer to in case installing via conda isn't an option?

@stas00

stas00 commented 2 years ago

JIT build is the workaround if conda is not an option. And the main thread is https://github.com/pytorch/pytorch/issues/69666

For some reason the problem went away for me with pip and pre-building. but perhaps it's not the case for all configurations?

could you please post the output of your: python -m torch.utils.collect_env, mine is:

PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 21.10 (x86_64)
GCC version: (Ubuntu 10.3.0-11ubuntu1) 10.3.0
Clang version: 13.0.0-2
CMake version: version 3.21.3
Libc version: glibc-2.34

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.32-051532-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration:
GPU 0: NVIDIA Graphics Device
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.3
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.11.0
[pip3] torch-scatter==2.0.9
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0+cu115
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] functorch                 0.0.1a0+2228c3b           dev_0    <develop>
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0            py38h7f8727e_0
[conda] mkl_fft                   1.3.1            py38hd3c417c_0
[conda] mkl_random                1.2.2            py38h51133e4_0
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.20.3                   pypi_0    pypi
[conda] numpy-base                1.21.2           py38h79a1101_0
[conda] pytorch                   1.11.0          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch-nightly
[conda] torch-scatter             2.0.9                    pypi_0    pypi
[conda] torchaudio                0.11.0               py38_cu113    pytorch
[conda] torchvision               0.12.0+cu115             pypi_0    pypi
maxmaier59 commented 2 years ago

As I've mentioned above building with pip fails DS_BUILD_OPS=1 pip install deepspeed

as well as

TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 python setup.py build_ext -j8 bdist_wheel

Here is my environment:

Collecting environment information... PyTorch version: 1.12.0.dev20220320+cu115 Is debug build: False CUDA used to build PyTorch: 11.5 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (crosstool-NG 1.24.0.133_b0863d8_dirty) 9.3.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: 11.4.48 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 GPU 1: NVIDIA GeForce RTX 3060

Nvidia driver version: 510.54 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] torch==1.12.0.dev20220320+cu115 [pip3] torchaudio==0.12.0.dev20220320+cu115 [pip3] torchvision==0.13.0.dev20220320+cu115 [conda] cudatoolkit-dev 11.4.0 h5e8e339_5 conda-forge [conda] numpy 1.22.3 pypi_0 pypi [conda] torch 1.12.0.dev20220320+cu115 pypi_0 pypi [conda] torchaudio 0.12.0.dev20220320+cu115 pypi_0 pypi [conda] torchvision 0.13.0.dev20220320+cu115 pypi_0 pypi

stas00 commented 2 years ago

OK, I have created a new conda env and I'm able to reproduce the problem:

conda create -y -n py38-pt112 python=3.8
conda activate py38-pt112
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu115/torch_nightly.html -U
pip install deepspeed

note, I first install deepspeed normally, so that it installs all the binary dependencies correctly. With pre-build it gets forced to build binary dependencies from scratch rather than fetch them from pypi.

git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 

no failure reported during the build.

python -c "import deepspeed; deepspeed.ops.op_builder.CPUAdamBuilder().load()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/nvme0/code/github/00optimize/deepspeed/deepspeed/ops/op_builder/builder.py", line 461, in load
    return importlib.import_module(self.absolute_name())
  File "/home/stas/anaconda3/envs/py38-pt112/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /mnt/nvme0/code/github/00optimize/deepspeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator
sayakpaul commented 2 years ago

@stas00 mine is:

Collecting environment information...
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: version 3.13.4
Libc version: glibc-2.10

Python version: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-4.19.0-20-cloud-amd64-x86_64-with-debian-10.12
Is CUDA available: True
CUDA runtime version: 11.0.221
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] torch==1.11.0+cu113
[conda] mypy_extensions           0.4.3            py37h89c1867_4    conda-forge
[conda] numpy                     1.19.5           py37h038b26d_2    conda-forge
[conda] torch                     1.11.0+cu113             pypi_0    pypi
stas00 commented 2 years ago

@maxmaier59, @sayakpaul please try this fix: https://github.com/microsoft/DeepSpeed/pull/1879

maxmaier59 commented 2 years ago

Sorry, I am confused now. What actually is the fix? Where shall I add -lcurand?

stas00 commented 2 years ago

the fix has already been merged. Just install deepspeed from master normally.

maxmaier59 commented 2 years ago

I can confirm that the error is gone when using "pip install deepspeed"

I still get the same error, when building deepspeed e. g. with "TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 python setup.py build_ext -j8 bdist_wheel"

stas00 commented 2 years ago

@maxmaier59, could you please send back:

cd DeepSpeed
git show -s --format=%H

DeepSpeed is where you run the build from, change it if it's a different path. Just want to make sure you have the right git revision.

if you haven't done git pull and have the old check out that would be the reason for it not working.

maxmaier59 commented 2 years ago

Here it is

git show -s --format=%H

"ebbcfd52734485943bff49c7dcd7c26eb4c44f21"

maxmaier59 commented 2 years ago

I've completely removed the old DeepSpeed and started from scratch by pulling the latest version

stas00 commented 2 years ago

You're somehow checking out 2-week old version. https://github.com/microsoft/DeepSpeed/commit/ebbcfd5273448594

We don't want the released version branch, but the bleed edge master branch. Do this:

git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed

and it'd work.

maxmaier59 commented 2 years ago

Sorry, but it still does not work for me but I am still getting the error message.

I have completely removed deepspeed and followed your instructions how to install. The hash is different this time. (gptneo_finetuned) max@max-Desktop:/media/max/Volume/GPT/finetune$ git clone https://github.com/microsoft/DeepSpeed Cloning into 'DeepSpeed'... remote: Enumerating objects: 17063, done. remote: Counting objects: 100% (2312/2312), done. remote: Compressing objects: 100% (1197/1197), done. remote: Total 17063 (delta 1477), reused 1746 (delta 1090), pack-reused 14751 Receiving objects: 100% (17063/17063), 21.66 MiB | 668.00 KiB/s, done. Resolving deltas: 100% (11948/11948), done. (gptneo_finetuned) max@max-Desktop:/media/max/Volume/GPT/finetune$ cd DeepSpeed/ (gptneo_finetuned) max@max-Desktop:/media/max/Volume/GPT/finetune/DeepSpeed$ git show -s --format=%H "d8ed3ce445b3d447a113305343d3c21fbf1bf2ba"

Here is the output of the deepspeed build

media/max/Volume/GPT/finetune/DeepSpeed DS_BUILD_OPS=0 Installed CUDA version 11.4 does not match the version torch was compiled with 11.5 but since the APIs are compatible, accepting this combination [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. Install Ops={'cpu_adam': 1, 'cpu_adagrad': False, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'async_io': False, 'utils': 1, 'quantizer': False, 'transformer_inference': False} version=0.6.2+d8ed3ce4, git_hash=d8ed3ce4, git_branch=master install_requires=['hjson', 'ninja', 'numpy', 'packaging', 'psutil', 'py-cpuinfo', 'torch', 'tqdm', 'triton==1.0.0'] compatible_ops={'cpu_adam': True, 'cpu_adagrad': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': True, 'transformer': True, 'stochastic_transformer': True, 'async_io': False, 'utils': True, 'quantizer': True, 'transformer_inference': True} ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7f90cc59d160>, <setuptools.extension.Extension('deepspeed.ops.utils_op') at 0x7f903d0f4040>] running build_ext /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/utils/cpp_extension.py:811: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.5). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) building 'deepspeed.ops.adam.cpu_adam_op' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/csrc building 'deepspeed.ops.utils_op' extension creating build/temp.linux-x86_64-3.8/csrc/adam creating build/temp.linux-x86_64-3.8/csrc/utils creating build/temp.linux-x86_64-3.8/csrc/common /home/max/anaconda3/envs/gptneo_finetuned/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/max/anaconda3/envs/gptneo_finetuned/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/max/anaconda3/envs/gptneo_finetuned/include -fPIC -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/utils/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.8/csrc/utils/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=utils_op -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 /home/max/anaconda3/envs/gptneo_finetuned/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/max/anaconda3/envs/gptneo_finetuned/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/max/anaconda3/envs/gptneo_finetuned/include -fPIC -Icsrc/includes -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o -O3 -std=c++14 -g -Wno-reorder -L/home/max/anaconda3/envs/gptneo_finetuned/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from csrc/includes/cpu_adam.h:12, from csrc/adam/cpu_adam.cpp:1: csrc/includes/simd.h:63: warning: ignoring #pragma unroll [-Wunknown-pragmas] 63 #pragma unroll
csrc/includes/simd.h:71: warning: ignoring #pragma unroll [-Wunknown-pragmas] 71 #pragma unroll
csrc/includes/simd.h:79: warning: ignoring #pragma unroll [-Wunknown-pragmas] 79 #pragma unroll
csrc/includes/simd.h:87: warning: ignoring #pragma unroll [-Wunknown-pragmas] 87 #pragma unroll
csrc/includes/simd.h:95: warning: ignoring #pragma unroll [-Wunknown-pragmas] 95 #pragma unroll
csrc/includes/simd.h:103: warning: ignoring #pragma unroll [-Wunknown-pragmas] 103 #pragma unroll
csrc/includes/simd.h:109: warning: ignoring #pragma unroll [-Wunknown-pragmas] 109 #pragma unroll
csrc/includes/simd.h:115: warning: ignoring #pragma unroll [-Wunknown-pragmas] 115 #pragma unroll
csrc/includes/simd.h:121: warning: ignoring #pragma unroll [-Wunknown-pragmas] 121 #pragma unroll
csrc/includes/simd.h:127: warning: ignoring #pragma unroll [-Wunknown-pragmas] 127 #pragma unroll
csrc/includes/simd.h:133: warning: ignoring #pragma unroll [-Wunknown-pragmas] 133 #pragma unroll

creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/deepspeed creating build/lib.linux-x86_64-3.8/deepspeed/ops /home/max/anaconda3/envs/gptneo_finetuned/bin/x86_64-conda-linux-gnu-c++ -pthread -shared -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/max/anaconda3/envs/gptneo_finetuned/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/max/anaconda3/envs/gptneo_finetuned/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath-link,/home/max/anaconda3/envs/gptneo_finetuned/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/max/anaconda3/envs/gptneo_finetuned/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/max/anaconda3/envs/gptneo_finetuned/include build/temp.linux-x86_64-3.8/csrc/utils/flatten_unflatten.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so

csrc/adam/cpu_adam.cpp: In member function 'void Adam_Optimizer::Step_1(float, float, float, float, size_t, half, bool)': csrc/adam/cpu_adam.cpp:45:17: warning: 'params_cast_h' may be used uninitialized in this function [-Wmaybe-uninitialized] 45 | __half params_cast_h; | ^~~~~ csrc/adam/cpu_adam.cpp:44:17: warning: 'grads_cast_h' may be used uninitialized in this function [-Wmaybe-uninitialized] 44 | half* grads_cast_h; | ^~~~ /home/max/anaconda3/envs/gptneo_finetuned/bin/nvcc -Icsrc/includes -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/common/custom_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/common/custom_cuda_kernel.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -gencode=arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/max/anaconda3/envs/gptneo_finetuned/bin/x86_64-conda-linux-gnu-cc creating build/lib.linux-x86_64-3.8/deepspeed/ops/adam /home/max/anaconda3/envs/gptneo_finetuned/bin/x86_64-conda-linux-gnu-c++ -pthread -shared -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/max/anaconda3/envs/gptneo_finetuned/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/max/anaconda3/envs/gptneo_finetuned/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath-link,/home/max/anaconda3/envs/gptneo_finetuned/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/max/anaconda3/envs/gptneo_finetuned/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/max/anaconda3/envs/gptneo_finetuned/include build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o build/temp.linux-x86_64-3.8/csrc/common/custom_cuda_kernel.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib64 -lcurand -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so running bdist_wheel running build running build_py copying deepspeed/constants.py -> build/lib.linux-x86_64-3.8/deepspeed copying deepspeed/env_report.py -> build/lib.linux-x86_64-3.8/deepspeed copying deepspeed/git_version_info.py -> build/lib.linux-x86_64-3.8/deepspeed copying deepspeed/git_version_info_installed.py -> build/lib.linux-x86_64-3.8/deepspeed copying deepspeed/init__.py -> build/lib.linux-x86_64-3.8/deepspeed creating build/lib.linux-x86_64-3.8/deepspeed/autotuning copying deepspeed/autotuning/autotuner.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning copying deepspeed/autotuning/config.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning copying deepspeed/autotuning/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning copying deepspeed/autotuning/scheduler.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning copying deepspeed/autotuning/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning copying deepspeed/autotuning/init.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning creating build/lib.linux-x86_64-3.8/deepspeed/checkpoint copying deepspeed/checkpoint/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/checkpoint copying deepspeed/checkpoint/init.py -> build/lib.linux-x86_64-3.8/deepspeed/checkpoint creating build/lib.linux-x86_64-3.8/deepspeed/elasticity copying deepspeed/elasticity/config.py -> build/lib.linux-x86_64-3.8/deepspeed/elasticity copying deepspeed/elasticity/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/elasticity copying deepspeed/elasticity/elasticity.py -> build/lib.linux-x86_64-3.8/deepspeed/elasticity copying deepspeed/elasticity/init.py -> build/lib.linux-x86_64-3.8/deepspeed/elasticity creating build/lib.linux-x86_64-3.8/deepspeed/inference copying deepspeed/inference/engine.py -> build/lib.linux-x86_64-3.8/deepspeed/inference copying deepspeed/inference/init.py -> build/lib.linux-x86_64-3.8/deepspeed/inference creating build/lib.linux-x86_64-3.8/deepspeed/launcher copying deepspeed/launcher/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/launcher copying deepspeed/launcher/launch.py -> build/lib.linux-x86_64-3.8/deepspeed/launcher copying deepspeed/launcher/multinode_runner.py -> build/lib.linux-x86_64-3.8/deepspeed/launcher copying deepspeed/launcher/runner.py -> build/lib.linux-x86_64-3.8/deepspeed/launcher copying deepspeed/launcher/init.py -> build/lib.linux-x86_64-3.8/deepspeed/launcher creating build/lib.linux-x86_64-3.8/deepspeed/module_inject copying deepspeed/module_inject/inject.py -> build/lib.linux-x86_64-3.8/deepspeed/module_inject copying deepspeed/module_inject/module_quantize.py -> build/lib.linux-x86_64-3.8/deepspeed/module_inject copying deepspeed/module_inject/replace_module.py -> build/lib.linux-x86_64-3.8/deepspeed/module_inject copying deepspeed/module_inject/replace_policy.py -> build/lib.linux-x86_64-3.8/deepspeed/module_inject copying deepspeed/module_inject/init.py -> build/lib.linux-x86_64-3.8/deepspeed/module_inject creating build/lib.linux-x86_64-3.8/deepspeed/moe copying deepspeed/moe/experts.py -> build/lib.linux-x86_64-3.8/deepspeed/moe copying deepspeed/moe/layer.py -> build/lib.linux-x86_64-3.8/deepspeed/moe copying deepspeed/moe/sharded_moe.py -> build/lib.linux-x86_64-3.8/deepspeed/moe copying deepspeed/moe/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/moe copying deepspeed/moe/init.py -> build/lib.linux-x86_64-3.8/deepspeed/moe copying deepspeed/ops/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops creating build/lib.linux-x86_64-3.8/deepspeed/pipe copying deepspeed/pipe/init.py -> build/lib.linux-x86_64-3.8/deepspeed/pipe creating build/lib.linux-x86_64-3.8/deepspeed/profiling copying deepspeed/profiling/config.py -> build/lib.linux-x86_64-3.8/deepspeed/profiling copying deepspeed/profiling/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/profiling copying deepspeed/profiling/init__.py -> build/lib.linux-x86_64-3.8/deepspeed/profiling creating build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/config.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/config_utils.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/dataloader.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/eigenvalue.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/engine.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/lr_schedules.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/progressive_layer_drop.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/quantize.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/sparse_tensor.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/state_dict_factory.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/weight_quantizer.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime copying deepspeed/runtime/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime creating build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/debug.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/distributed.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/exceptions.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/groups.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/logging.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/nvtx.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/timer.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/zero_to_fp32.py -> build/lib.linux-x86_64-3.8/deepspeed/utils copying deepspeed/utils/init.py -> build/lib.linux-x86_64-3.8/deepspeed/utils

creating build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner copying deepspeed/autotuning/tuner/base_tuner.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner copying deepspeed/autotuning/tuner/cost_model.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner copying deepspeed/autotuning/tuner/index_based_tuner.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner copying deepspeed/autotuning/tuner/model_based_tuner.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner copying deepspeed/autotuning/tuner/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner copying deepspeed/autotuning/tuner/init.py -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner creating build/lib.linux-x86_64-3.8/deepspeed/ops/adagrad copying deepspeed/ops/adagrad/cpu_adagrad.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/adagrad copying deepspeed/ops/adagrad/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/adagrad copying deepspeed/ops/adam/cpu_adam.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/adam copying deepspeed/ops/adam/fused_adam.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/adam copying deepspeed/ops/adam/multi_tensor_apply.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/adam copying deepspeed/ops/adam/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/adam creating build/lib.linux-x86_64-3.8/deepspeed/ops/aio copying deepspeed/ops/aio/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/aio creating build/lib.linux-x86_64-3.8/deepspeed/ops/lamb copying deepspeed/ops/lamb/fused_lamb.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/lamb copying deepspeed/ops/lamb/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/lamb creating build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/async_io.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/builder.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/cpu_adagrad.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/cpu_adam.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/fused_adam.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/fused_lamb.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/quantizer.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/sparse_attn.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/stochastic_transformer.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/transformer.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/transformer_inference.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder copying deepspeed/ops/op_builder/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder creating build/lib.linux-x86_64-3.8/deepspeed/ops/quantizer copying deepspeed/ops/quantizer/quantizer.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/quantizer copying deepspeed/ops/quantizer/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/quantizer creating build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/bert_sparse_self_attention.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/matmul.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/softmax.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/sparse_attention_utils.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/sparse_self_attention.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/sparsity_config.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention copying deepspeed/ops/sparse_attention/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention creating build/lib.linux-x86_64-3.8/deepspeed/ops/transformer copying deepspeed/ops/transformer/transformer.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/transformer copying deepspeed/ops/transformer/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/transformer creating build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc copying deepspeed/ops/sparse_attention/trsrc/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc creating build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference copying deepspeed/ops/transformer/inference/moe_inference.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference copying deepspeed/ops/transformer/inference/transformer_inference.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference copying deepspeed/ops/transformer/inference/init.py -> build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference creating build/lib.linux-x86_64-3.8/deepspeed/profiling/flops_profiler copying deepspeed/profiling/flops_profiler/profiler.py -> build/lib.linux-x86_64-3.8/deepspeed/profiling/flops_profiler copying deepspeed/profiling/flops_profiler/init.py -> build/lib.linux-x86_64-3.8/deepspeed/profiling/flops_profiler creating build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing copying deepspeed/runtime/activation_checkpointing/checkpointing.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing copying deepspeed/runtime/activation_checkpointing/config.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing copying deepspeed/runtime/activation_checkpointing/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing creating build/lib.linux-x86_64-3.8/deepspeed/runtime/comm copying deepspeed/runtime/comm/coalesced_collectives.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/comm copying deepspeed/runtime/comm/mpi.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/comm copying deepspeed/runtime/comm/nccl.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/comm copying deepspeed/runtime/comm/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/comm creating build/lib.linux-x86_64-3.8/deepspeed/runtime/compression copying deepspeed/runtime/compression/cupy.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/compression copying deepspeed/runtime/compression/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/compression creating build/lib.linux-x86_64-3.8/deepspeed/runtime/data_pipeline copying deepspeed/runtime/data_pipeline/curriculum_scheduler.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/data_pipeline copying deepspeed/runtime/data_pipeline/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/data_pipeline creating build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16 copying deepspeed/runtime/fp16/fused_optimizer.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16 copying deepspeed/runtime/fp16/loss_scaler.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16 copying deepspeed/runtime/fp16/unfused_optimizer.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16 copying deepspeed/runtime/fp16/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16 creating build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe copying deepspeed/runtime/pipe/engine.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe copying deepspeed/runtime/pipe/module.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe copying deepspeed/runtime/pipe/p2p.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe copying deepspeed/runtime/pipe/schedule.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe copying deepspeed/runtime/pipe/topology.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe copying deepspeed/runtime/pipe/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe creating build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/aio_config.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/async_swapper.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/optimizer_utils.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/partitioned_optimizer_swapper.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/partitioned_param_swapper.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/pipelined_optimizer_swapper.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor copying deepspeed/runtime/swap_tensor/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor creating build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/config.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/constants.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/contiguous_memory_allocator.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/linear.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/offload_config.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/offload_constants.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/partition_parameters.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero

copying deepspeed/runtime/zero/stage3.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/stage_1_and_2.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/test.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/tiling.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/utils.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero copying deepspeed/runtime/zero/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/zero creating build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit copying deepspeed/runtime/fp16/onebit/adam.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit copying deepspeed/runtime/fp16/onebit/lamb.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit copying deepspeed/runtime/fp16/onebit/zoadam.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit copying deepspeed/runtime/fp16/onebit/init.py -> build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit running egg_info writing deepspeed.egg-info/PKG-INFO writing dependency_links to deepspeed.egg-info/dependency_links.txt writing entry points to deepspeed.egg-info/entry_points.txt writing requirements to deepspeed.egg-info/requires.txt writing top-level names to deepspeed.egg-info/top_level.txt reading manifest file 'deepspeed.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '.hip' under directory 'deepspeed' warning: no files found matching '.cc' under directory 'deepspeed' warning: no files found matching '.tr' under directory 'csrc' warning: no files found matching '.cc' under directory 'csrc' adding license file 'LICENSE' writing manifest file 'deepspeed.egg-info/SOURCES.txt' creating build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates copying deepspeed/autotuning/config_templates/template_zero0.json -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates copying deepspeed/autotuning/config_templates/template_zero1.json -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates copying deepspeed/autotuning/config_templates/template_zero2.json -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates copying deepspeed/autotuning/config_templates/template_zero3.json -> build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adagrad copying deepspeed/ops/csrc/adagrad/cpu_adagrad.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adagrad creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam copying deepspeed/ops/csrc/adam/cpu_adam.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam copying deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam copying deepspeed/ops/csrc/adam/multi_tensor_adam.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam copying deepspeed/ops/csrc/adam/multi_tensor_apply.cuh -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common copying deepspeed/ops/csrc/aio/common/deepspeed_aio_common.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common copying deepspeed/ops/csrc/aio/common/deepspeed_aio_common.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common copying deepspeed/ops/csrc/aio/common/deepspeed_aio_types.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common copying deepspeed/ops/csrc/aio/common/deepspeed_aio_types.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common copying deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common copying deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib copying deepspeed/ops/csrc/aio/py_lib/py_ds_aio.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_test copying deepspeed/ops/csrc/aio/py_test/single_process_config.json -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_test creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/common copying deepspeed/ops/csrc/common/custom_cuda_kernel.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/common creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/StopWatch.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/Timer.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/compat.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/context.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/cpu_adagrad.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/cpu_adam.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/cublas_wrappers.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/custom_cuda_layers.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/dropout.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/feed_forward.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/gelu.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/gemm_test.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/general_kernels.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/normalize_layer.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/quantizer.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/simd.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/softmax.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/strided_batch_gemm.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes copying deepspeed/ops/csrc/includes/type_shim.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/lamb copying deepspeed/ops/csrc/lamb/fused_lamb_cuda.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/lamb copying deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/lamb creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/quantization copying deepspeed/ops/csrc/quantization/pt_binding.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/quantization copying deepspeed/ops/csrc/quantization/quantizer.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/quantization creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/sparse_attention copying deepspeed/ops/csrc/sparse_attention/utils.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/sparse_attention creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/cublas_wrappers.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/dropout_kernels.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/ds_transformer_cuda.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/gelu_kernels.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/general_kernels.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/normalize_kernels.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/softmax_kernels.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer copying deepspeed/ops/csrc/transformer/transform_kernels.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc copying deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc copying deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc copying deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc

copying deepspeed/ops/csrc/transformer/inference/csrc/normalize.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc copying deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc copying deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes copying deepspeed/ops/csrc/transformer/inference/includes/context.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes copying deepspeed/ops/csrc/transformer/inference/includes/cublas_wrappers.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes copying deepspeed/ops/csrc/transformer/inference/includes/custom_cuda_layers.h -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes creating build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/utils copying deepspeed/ops/csrc/utils/flatten_unflatten.cpp -> build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/utils copying deepspeed/ops/sparse_attention/trsrc/matmul.tr -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc copying deepspeed/ops/sparse_attention/trsrc/softmax_bwd.tr -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc copying deepspeed/ops/sparse_attention/trsrc/softmax_fwd.tr -> build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc running build_ext /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/utils/cpp_extension.py:811: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.5). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) running build_scripts creating build/scripts-3.8 copying and adjusting bin/deepspeed -> build/scripts-3.8 copying and adjusting bin/deepspeed.pt -> build/scripts-3.8 copying and adjusting bin/ds -> build/scripts-3.8 copying bin/ds_ssh -> build/scripts-3.8 copying and adjusting bin/ds_report -> build/scripts-3.8 copying and adjusting bin/ds_elastic -> build/scripts-3.8 installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64 creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/deepspeed creating build/bdist.linux-x86_64/wheel/deepspeed/autotuning copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/autotuner.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/config.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning creating build/bdist.linux-x86_64/wheel/deepspeed/autotuning/config_templates copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates/template_zero0.json -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/config_templates copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates/template_zero1.json -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/config_templates copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates/template_zero2.json -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/config_templates copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/config_templates/template_zero3.json -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/config_templates copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/scheduler.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning creating build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner/base_tuner.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner/cost_model.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner/index_based_tuner.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner/model_based_tuner.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/tuner/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning/tuner copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning copying build/lib.linux-x86_64-3.8/deepspeed/autotuning/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/autotuning creating build/bdist.linux-x86_64/wheel/deepspeed/checkpoint copying build/lib.linux-x86_64-3.8/deepspeed/checkpoint/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/checkpoint copying build/lib.linux-x86_64-3.8/deepspeed/checkpoint/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/checkpoint copying build/lib.linux-x86_64-3.8/deepspeed/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed creating build/bdist.linux-x86_64/wheel/deepspeed/elasticity copying build/lib.linux-x86_64-3.8/deepspeed/elasticity/config.py -> build/bdist.linux-x86_64/wheel/deepspeed/elasticity copying build/lib.linux-x86_64-3.8/deepspeed/elasticity/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/elasticity copying build/lib.linux-x86_64-3.8/deepspeed/elasticity/elasticity.py -> build/bdist.linux-x86_64/wheel/deepspeed/elasticity copying build/lib.linux-x86_64-3.8/deepspeed/elasticity/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/elasticity copying build/lib.linux-x86_64-3.8/deepspeed/env_report.py -> build/bdist.linux-x86_64/wheel/deepspeed copying build/lib.linux-x86_64-3.8/deepspeed/git_version_info.py -> build/bdist.linux-x86_64/wheel/deepspeed copying build/lib.linux-x86_64-3.8/deepspeed/git_version_info_installed.py -> build/bdist.linux-x86_64/wheel/deepspeed creating build/bdist.linux-x86_64/wheel/deepspeed/inference copying build/lib.linux-x86_64-3.8/deepspeed/inference/engine.py -> build/bdist.linux-x86_64/wheel/deepspeed/inference copying build/lib.linux-x86_64-3.8/deepspeed/inference/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/inference creating build/bdist.linux-x86_64/wheel/deepspeed/launcher copying build/lib.linux-x86_64-3.8/deepspeed/launcher/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/launcher copying build/lib.linux-x86_64-3.8/deepspeed/launcher/launch.py -> build/bdist.linux-x86_64/wheel/deepspeed/launcher copying build/lib.linux-x86_64-3.8/deepspeed/launcher/multinode_runner.py -> build/bdist.linux-x86_64/wheel/deepspeed/launcher copying build/lib.linux-x86_64-3.8/deepspeed/launcher/runner.py -> build/bdist.linux-x86_64/wheel/deepspeed/launcher copying build/lib.linux-x86_64-3.8/deepspeed/launcher/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/launcher creating build/bdist.linux-x86_64/wheel/deepspeed/module_inject copying build/lib.linux-x86_64-3.8/deepspeed/module_inject/inject.py -> build/bdist.linux-x86_64/wheel/deepspeed/module_inject copying build/lib.linux-x86_64-3.8/deepspeed/module_inject/module_quantize.py -> build/bdist.linux-x86_64/wheel/deepspeed/module_inject copying build/lib.linux-x86_64-3.8/deepspeed/module_inject/replace_module.py -> build/bdist.linux-x86_64/wheel/deepspeed/module_inject copying build/lib.linux-x86_64-3.8/deepspeed/module_inject/replace_policy.py -> build/bdist.linux-x86_64/wheel/deepspeed/module_inject copying build/lib.linux-x86_64-3.8/deepspeed/module_inject/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/module_inject creating build/bdist.linux-x86_64/wheel/deepspeed/moe copying build/lib.linux-x86_64-3.8/deepspeed/moe/experts.py -> build/bdist.linux-x86_64/wheel/deepspeed/moe copying build/lib.linux-x86_64-3.8/deepspeed/moe/layer.py -> build/bdist.linux-x86_64/wheel/deepspeed/moe copying build/lib.linux-x86_64-3.8/deepspeed/moe/sharded_moe.py -> build/bdist.linux-x86_64/wheel/deepspeed/moe copying build/lib.linux-x86_64-3.8/deepspeed/moe/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/moe copying build/lib.linux-x86_64-3.8/deepspeed/moe/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/moe creating build/bdist.linux-x86_64/wheel/deepspeed/ops creating build/bdist.linux-x86_64/wheel/deepspeed/ops/adagrad copying build/lib.linux-x86_64-3.8/deepspeed/ops/adagrad/cpu_adagrad.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adagrad copying build/lib.linux-x86_64-3.8/deepspeed/ops/adagrad/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adagrad creating build/bdist.linux-x86_64/wheel/deepspeed/ops/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adam

copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/fused_adam.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/multi_tensor_apply.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adam creating build/bdist.linux-x86_64/wheel/deepspeed/ops/aio copying build/lib.linux-x86_64-3.8/deepspeed/ops/aio/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/aio creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adagrad copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adagrad/cpu_adagrad.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adagrad creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam/cpu_adam.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adam copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/adam/multi_tensor_apply.cuh -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/adam creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common/deepspeed_aio_common.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common/deepspeed_aio_common.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common/deepspeed_aio_types.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common/deepspeed_aio_types.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/common creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_lib/py_ds_aio.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_lib creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_test copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/aio/py_test/single_process_config.json -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/aio/py_test creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/common copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/common creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/compat.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/context.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/cpu_adagrad.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/cpu_adam.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/cublas_wrappers.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/custom_cuda_layers.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/dropout.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/feed_forward.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/gelu.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/gemm_test.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/general_kernels.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/normalize_layer.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/quantizer.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/simd.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/softmax.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/StopWatch.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/strided_batch_gemm.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/Timer.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/includes/type_shim.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/includes creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/lamb copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/lamb/fused_lamb_cuda.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/lamb copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/lamb creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/quantization copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/quantization/pt_binding.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/quantization copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/quantization/quantizer.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/quantization creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/sparse_attention/utils.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/sparse_attention creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/cublas_wrappers.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/dropout_kernels.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/ds_transformer_cuda.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/gelu_kernels.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/general_kernels.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc/normalize.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/csrc creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes/context.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes/cublas_wrappers.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/inference/includes/custom_cuda_layers.h -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer/inference/includes copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/normalize_kernels.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/softmax_kernels.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/transformer/transform_kernels.cu -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/transformer creating build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/utils copying build/lib.linux-x86_64-3.8/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -> build/bdist.linux-x86_64/wheel/deepspeed/ops/csrc/utils creating build/bdist.linux-x86_64/wheel/deepspeed/ops/lamb copying build/lib.linux-x86_64-3.8/deepspeed/ops/lamb/fused_lamb.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/lamb copying build/lib.linux-x86_64-3.8/deepspeed/ops/lamb/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/lamb creating build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/async_io.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/builder.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/cpu_adagrad.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/cpu_adam.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/fused_adam.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/fused_lamb.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/quantizer.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/sparse_attn.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/stochastic_transformer.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/transformer.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/transformer_inference.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder copying build/lib.linux-x86_64-3.8/deepspeed/ops/op_builder/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/op_builder creating build/bdist.linux-x86_64/wheel/deepspeed/ops/quantizer copying build/lib.linux-x86_64-3.8/deepspeed/ops/quantizer/quantizer.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/quantizer copying build/lib.linux-x86_64-3.8/deepspeed/ops/quantizer/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/quantizer

creating build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/bert_sparse_self_attention.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/matmul.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/softmax.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/sparse_attention_utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/sparse_self_attention.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/sparsity_config.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention creating build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention/trsrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc/matmul.tr -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention/trsrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc/softmax_bwd.tr -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention/trsrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc/softmax_fwd.tr -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention/trsrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/trsrc/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention/trsrc copying build/lib.linux-x86_64-3.8/deepspeed/ops/sparse_attention/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/sparse_attention creating build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer creating build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer/inference copying build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference/moe_inference.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer/inference copying build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference/transformer_inference.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer/inference copying build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/inference/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer/inference copying build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/transformer.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/transformer/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops/transformer copying build/lib.linux-x86_64-3.8/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/deepspeed/ops copying build/lib.linux-x86_64-3.8/deepspeed/ops/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/ops creating build/bdist.linux-x86_64/wheel/deepspeed/pipe copying build/lib.linux-x86_64-3.8/deepspeed/pipe/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/pipe creating build/bdist.linux-x86_64/wheel/deepspeed/profiling copying build/lib.linux-x86_64-3.8/deepspeed/profiling/config.py -> build/bdist.linux-x86_64/wheel/deepspeed/profiling copying build/lib.linux-x86_64-3.8/deepspeed/profiling/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/profiling creating build/bdist.linux-x86_64/wheel/deepspeed/profiling/flops_profiler copying build/lib.linux-x86_64-3.8/deepspeed/profiling/flops_profiler/profiler.py -> build/bdist.linux-x86_64/wheel/deepspeed/profiling/flops_profiler copying build/lib.linux-x86_64-3.8/deepspeed/profiling/flops_profiler/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/profiling/flops_profiler copying build/lib.linux-x86_64-3.8/deepspeed/profiling/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/profiling creating build/bdist.linux-x86_64/wheel/deepspeed/runtime creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/activation_checkpointing copying build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing/checkpointing.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/activation_checkpointing copying build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing/config.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/activation_checkpointing copying build/lib.linux-x86_64-3.8/deepspeed/runtime/activation_checkpointing/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/activation_checkpointing creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/comm copying build/lib.linux-x86_64-3.8/deepspeed/runtime/comm/coalesced_collectives.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/comm copying build/lib.linux-x86_64-3.8/deepspeed/runtime/comm/mpi.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/comm copying build/lib.linux-x86_64-3.8/deepspeed/runtime/comm/nccl.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/comm copying build/lib.linux-x86_64-3.8/deepspeed/runtime/comm/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/comm creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/compression copying build/lib.linux-x86_64-3.8/deepspeed/runtime/compression/cupy.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/compression copying build/lib.linux-x86_64-3.8/deepspeed/runtime/compression/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/compression copying build/lib.linux-x86_64-3.8/deepspeed/runtime/config.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/config_utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/dataloader.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/data_pipeline copying build/lib.linux-x86_64-3.8/deepspeed/runtime/data_pipeline/curriculum_scheduler.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/data_pipeline copying build/lib.linux-x86_64-3.8/deepspeed/runtime/data_pipeline/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/data_pipeline copying build/lib.linux-x86_64-3.8/deepspeed/runtime/eigenvalue.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/engine.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16 copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/fused_optimizer.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16 copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/loss_scaler.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16 creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16/onebit copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit/adam.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16/onebit copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit/lamb.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16/onebit copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit/zoadam.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16/onebit copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/onebit/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16/onebit copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/unfused_optimizer.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16 copying build/lib.linux-x86_64-3.8/deepspeed/runtime/fp16/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/fp16 copying build/lib.linux-x86_64-3.8/deepspeed/runtime/lr_schedules.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe/engine.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe/module.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe/p2p.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe/schedule.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe/topology.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/pipe/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/pipe copying build/lib.linux-x86_64-3.8/deepspeed/runtime/progressive_layer_drop.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/quantize.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/sparse_tensor.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/state_dict_factory.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime

creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/aio_config.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/async_swapper.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/optimizer_utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/partitioned_optimizer_swapper.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/partitioned_param_swapper.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/pipelined_optimizer_swapper.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/swap_tensor/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/swap_tensor copying build/lib.linux-x86_64-3.8/deepspeed/runtime/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime copying build/lib.linux-x86_64-3.8/deepspeed/runtime/weight_quantizer.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime creating build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/config.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/contiguous_memory_allocator.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/linear.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/offload_config.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/offload_constants.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/partition_parameters.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/stage3.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/stage_1_and_2.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/test.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/tiling.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/utils.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/zero/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime/zero copying build/lib.linux-x86_64-3.8/deepspeed/runtime/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/runtime creating build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/debug.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/distributed.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/exceptions.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/groups.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/logging.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/nvtx.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/timer.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/zero_to_fp32.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/utils/init.py -> build/bdist.linux-x86_64/wheel/deepspeed/utils copying build/lib.linux-x86_64-3.8/deepspeed/init.py -> build/bdist.linux-x86_64/wheel/deepspeed running install_egg_info Copying deepspeed.egg-info to build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4-py3.8.egg-info running install_scripts creating build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data creating build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts copying build/scripts-3.8/deepspeed -> build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts copying build/scripts-3.8/deepspeed.pt -> build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts copying build/scripts-3.8/ds -> build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts copying build/scripts-3.8/ds_elastic -> build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts copying build/scripts-3.8/ds_report -> build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts copying build/scripts-3.8/ds_ssh -> build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts changing mode of build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts/deepspeed to 777 changing mode of build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts/deepspeed.pt to 777 changing mode of build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts/ds to 777 changing mode of build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts/ds_elastic to 777 changing mode of build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts/ds_report to 777 changing mode of build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.data/scripts/ds_ssh to 777 adding license file "LICENSE" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/deepspeed-0.6.2+d8ed3ce4.dist-info/WHEEL creating 'dist/deepspeed-0.6.2+d8ed3ce4-cp38-cp38-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'deepspeed/init.py' adding 'deepspeed/constants.py' adding 'deepspeed/env_report.py' adding 'deepspeed/git_version_info.py' adding 'deepspeed/git_version_info_installed.py' adding 'deepspeed/autotuning/init.py' adding 'deepspeed/autotuning/autotuner.py' adding 'deepspeed/autotuning/config.py' adding 'deepspeed/autotuning/constants.py' adding 'deepspeed/autotuning/scheduler.py' adding 'deepspeed/autotuning/utils.py' adding 'deepspeed/autotuning/config_templates/template_zero0.json' adding 'deepspeed/autotuning/config_templates/template_zero1.json' adding 'deepspeed/autotuning/config_templates/template_zero2.json' adding 'deepspeed/autotuning/config_templates/template_zero3.json' adding 'deepspeed/autotuning/tuner/init.py' adding 'deepspeed/autotuning/tuner/base_tuner.py' adding 'deepspeed/autotuning/tuner/cost_model.py' adding 'deepspeed/autotuning/tuner/index_based_tuner.py' adding 'deepspeed/autotuning/tuner/model_based_tuner.py' adding 'deepspeed/autotuning/tuner/utils.py' adding 'deepspeed/checkpoint/init.py' adding 'deepspeed/checkpoint/constants.py' adding 'deepspeed/elasticity/init.py' adding 'deepspeed/elasticity/config.py' adding 'deepspeed/elasticity/constants.py' adding 'deepspeed/elasticity/elasticity.py' adding 'deepspeed/inference/init.py' adding 'deepspeed/inference/engine.py' adding 'deepspeed/launcher/init.py' adding 'deepspeed/launcher/constants.py' adding 'deepspeed/launcher/launch.py' adding 'deepspeed/launcher/multinode_runner.py' adding 'deepspeed/launcher/runner.py' adding 'deepspeed/module_inject/init.py' adding 'deepspeed/module_inject/inject.py' adding 'deepspeed/module_inject/module_quantize.py' adding 'deepspeed/module_inject/replace_module.py' adding 'deepspeed/module_inject/replace_policy.py' adding 'deepspeed/moe/init.py' adding 'deepspeed/moe/experts.py' adding 'deepspeed/moe/layer.py' adding 'deepspeed/moe/sharded_moe.py' adding 'deepspeed/moe/utils.py' adding 'deepspeed/ops/init.py' adding 'deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so' adding 'deepspeed/ops/adagrad/init.py' adding 'deepspeed/ops/adagrad/cpu_adagrad.py' adding 'deepspeed/ops/adam/init.py' adding 'deepspeed/ops/adam/cpu_adam.py'

adding 'deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so' adding 'deepspeed/ops/adam/fused_adam.py' adding 'deepspeed/ops/adam/multi_tensor_apply.py' adding 'deepspeed/ops/aio/init.py' adding 'deepspeed/ops/csrc/adagrad/cpu_adagrad.cpp' adding 'deepspeed/ops/csrc/adam/cpu_adam.cpp' adding 'deepspeed/ops/csrc/adam/fused_adam_frontend.cpp' adding 'deepspeed/ops/csrc/adam/multi_tensor_adam.cu' adding 'deepspeed/ops/csrc/adam/multi_tensor_apply.cuh' adding 'deepspeed/ops/csrc/aio/common/deepspeed_aio_common.cpp' adding 'deepspeed/ops/csrc/aio/common/deepspeed_aio_common.h' adding 'deepspeed/ops/csrc/aio/common/deepspeed_aio_types.cpp' adding 'deepspeed/ops/csrc/aio/common/deepspeed_aio_types.h' adding 'deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.cpp' adding 'deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.h' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.cpp' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.h' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.cpp' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.h' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.cpp' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.h' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.cpp' adding 'deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.h' adding 'deepspeed/ops/csrc/aio/py_lib/py_ds_aio.cpp' adding 'deepspeed/ops/csrc/aio/py_test/single_process_config.json' adding 'deepspeed/ops/csrc/common/custom_cuda_kernel.cu' adding 'deepspeed/ops/csrc/includes/StopWatch.h' adding 'deepspeed/ops/csrc/includes/Timer.h' adding 'deepspeed/ops/csrc/includes/compat.h' adding 'deepspeed/ops/csrc/includes/context.h' adding 'deepspeed/ops/csrc/includes/cpu_adagrad.h' adding 'deepspeed/ops/csrc/includes/cpu_adam.h' adding 'deepspeed/ops/csrc/includes/cublas_wrappers.h' adding 'deepspeed/ops/csrc/includes/custom_cuda_layers.h' adding 'deepspeed/ops/csrc/includes/dropout.h' adding 'deepspeed/ops/csrc/includes/ds_transformer_cuda.h' adding 'deepspeed/ops/csrc/includes/feed_forward.h' adding 'deepspeed/ops/csrc/includes/gelu.h' adding 'deepspeed/ops/csrc/includes/gemm_test.h' adding 'deepspeed/ops/csrc/includes/general_kernels.h' adding 'deepspeed/ops/csrc/includes/normalize_layer.h' adding 'deepspeed/ops/csrc/includes/quantizer.h' adding 'deepspeed/ops/csrc/includes/simd.h' adding 'deepspeed/ops/csrc/includes/softmax.h' adding 'deepspeed/ops/csrc/includes/strided_batch_gemm.h' adding 'deepspeed/ops/csrc/includes/type_shim.h' adding 'deepspeed/ops/csrc/lamb/fused_lamb_cuda.cpp' adding 'deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu' adding 'deepspeed/ops/csrc/quantization/pt_binding.cpp' adding 'deepspeed/ops/csrc/quantization/quantizer.cu' adding 'deepspeed/ops/csrc/sparse_attention/utils.cpp' adding 'deepspeed/ops/csrc/transformer/cublas_wrappers.cu' adding 'deepspeed/ops/csrc/transformer/dropout_kernels.cu' adding 'deepspeed/ops/csrc/transformer/ds_transformer_cuda.cpp' adding 'deepspeed/ops/csrc/transformer/gelu_kernels.cu' adding 'deepspeed/ops/csrc/transformer/general_kernels.cu' adding 'deepspeed/ops/csrc/transformer/normalize_kernels.cu' adding 'deepspeed/ops/csrc/transformer/softmax_kernels.cu' adding 'deepspeed/ops/csrc/transformer/transform_kernels.cu' adding 'deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu' adding 'deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu' adding 'deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu' adding 'deepspeed/ops/csrc/transformer/inference/csrc/normalize.cu' adding 'deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp' adding 'deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu' adding 'deepspeed/ops/csrc/transformer/inference/includes/context.h' adding 'deepspeed/ops/csrc/transformer/inference/includes/cublas_wrappers.h' adding 'deepspeed/ops/csrc/transformer/inference/includes/custom_cuda_layers.h' adding 'deepspeed/ops/csrc/utils/flatten_unflatten.cpp' adding 'deepspeed/ops/lamb/init.py' adding 'deepspeed/ops/lamb/fused_lamb.py' adding 'deepspeed/ops/op_builder/init.py' adding 'deepspeed/ops/op_builder/async_io.py' adding 'deepspeed/ops/op_builder/builder.py' adding 'deepspeed/ops/op_builder/cpu_adagrad.py' adding 'deepspeed/ops/op_builder/cpu_adam.py' adding 'deepspeed/ops/op_builder/fused_adam.py' adding 'deepspeed/ops/op_builder/fused_lamb.py' adding 'deepspeed/ops/op_builder/quantizer.py' adding 'deepspeed/ops/op_builder/sparse_attn.py' adding 'deepspeed/ops/op_builder/stochastic_transformer.py' adding 'deepspeed/ops/op_builder/transformer.py' adding 'deepspeed/ops/op_builder/transformer_inference.py' adding 'deepspeed/ops/op_builder/utils.py' adding 'deepspeed/ops/quantizer/init.py' adding 'deepspeed/ops/quantizer/quantizer.py' adding 'deepspeed/ops/sparse_attention/init.py' adding 'deepspeed/ops/sparse_attention/bert_sparse_self_attention.py' adding 'deepspeed/ops/sparse_attention/matmul.py' adding 'deepspeed/ops/sparse_attention/softmax.py' adding 'deepspeed/ops/sparse_attention/sparse_attention_utils.py' adding 'deepspeed/ops/sparse_attention/sparse_self_attention.py' adding 'deepspeed/ops/sparse_attention/sparsity_config.py' adding 'deepspeed/ops/sparse_attention/trsrc/init.py' adding 'deepspeed/ops/sparse_attention/trsrc/matmul.tr' adding 'deepspeed/ops/sparse_attention/trsrc/softmax_bwd.tr' adding 'deepspeed/ops/sparse_attention/trsrc/softmax_fwd.tr' adding 'deepspeed/ops/transformer/init.py' adding 'deepspeed/ops/transformer/transformer.py' adding 'deepspeed/ops/transformer/inference/init.py' adding 'deepspeed/ops/transformer/inference/moe_inference.py' adding 'deepspeed/ops/transformer/inference/transformer_inference.py' adding 'deepspeed/pipe/init.py' adding 'deepspeed/profiling/init.py' adding 'deepspeed/profiling/config.py' adding 'deepspeed/profiling/constants.py' adding 'deepspeed/profiling/flops_profiler/init.py' adding 'deepspeed/profiling/flops_profiler/profiler.py' adding 'deepspeed/runtime/init.py' adding 'deepspeed/runtime/config.py' adding 'deepspeed/runtime/config_utils.py' adding 'deepspeed/runtime/constants.py' adding 'deepspeed/runtime/dataloader.py' adding 'deepspeed/runtime/eigenvalue.py' adding 'deepspeed/runtime/engine.py' adding 'deepspeed/runtime/lr_schedules.py' adding 'deepspeed/runtime/progressive_layer_drop.py' adding 'deepspeed/runtime/quantize.py' adding 'deepspeed/runtime/sparse_tensor.py' adding 'deepspeed/runtime/state_dict_factory.py' adding 'deepspeed/runtime/utils.py' adding 'deepspeed/runtime/weight_quantizer.py' adding 'deepspeed/runtime/activation_checkpointing/init.py' adding 'deepspeed/runtime/activation_checkpointing/checkpointing.py' adding 'deepspeed/runtime/activation_checkpointing/config.py' adding 'deepspeed/runtime/comm/init.py' adding 'deepspeed/runtime/comm/coalesced_collectives.py' adding 'deepspeed/runtime/comm/mpi.py' adding 'deepspeed/runtime/comm/nccl.py' adding 'deepspeed/runtime/compression/init.py' adding 'deepspeed/runtime/compression/cupy.py' adding 'deepspeed/runtime/data_pipeline/init.py' adding 'deepspeed/runtime/data_pipeline/curriculum_scheduler.py' adding 'deepspeed/runtime/fp16/init.py' adding 'deepspeed/runtime/fp16/fused_optimizer.py' adding 'deepspeed/runtime/fp16/loss_scaler.py' adding 'deepspeed/runtime/fp16/unfused_optimizer.py' adding 'deepspeed/runtime/fp16/onebit/init.py' adding 'deepspeed/runtime/fp16/onebit/adam.py' adding 'deepspeed/runtime/fp16/onebit/lamb.py' adding 'deepspeed/runtime/fp16/onebit/zoadam.py' adding 'deepspeed/runtime/pipe/init.py' adding 'deepspeed/runtime/pipe/engine.py' adding 'deepspeed/runtime/pipe/module.py' adding 'deepspeed/runtime/pipe/p2p.py' adding 'deepspeed/runtime/pipe/schedule.py' adding 'deepspeed/runtime/pipe/topology.py' adding 'deepspeed/runtime/swap_tensor/init.py' adding 'deepspeed/runtime/swap_tensor/aio_config.py' adding 'deepspeed/runtime/swap_tensor/async_swapper.py' adding 'deepspeed/runtime/swap_tensor/constants.py' adding 'deepspeed/runtime/swap_tensor/optimizer_utils.py' adding 'deepspeed/runtime/swap_tensor/partitioned_optimizer_swapper.py' adding 'deepspeed/runtime/swap_tensor/partitioned_param_swapper.py' adding 'deepspeed/runtime/swap_tensor/pipelined_optimizer_swapper.py' adding 'deepspeed/runtime/swap_tensor/utils.py' adding 'deepspeed/runtime/zero/init.py' adding 'deepspeed/runtime/zero/config.py' adding 'deepspeed/runtime/zero/constants.py' adding 'deepspeed/runtime/zero/contiguous_memory_allocator.py' adding 'deepspeed/runtime/zero/linear.py' adding 'deepspeed/runtime/zero/offload_config.py' adding 'deepspeed/runtime/zero/offload_constants.py' adding 'deepspeed/runtime/zero/partition_parameters.py' adding 'deepspeed/runtime/zero/stage3.py' adding 'deepspeed/runtime/zero/stage_1_and_2.py' adding 'deepspeed/runtime/zero/test.py' adding 'deepspeed/runtime/zero/tiling.py' adding 'deepspeed/runtime/zero/utils.py' adding 'deepspeed/utils/init.py' adding 'deepspeed/utils/debug.py' adding 'deepspeed/utils/distributed.py' adding 'deepspeed/utils/exceptions.py' adding 'deepspeed/utils/groups.py' adding 'deepspeed/utils/logging.py' adding 'deepspeed/utils/nvtx.py' adding 'deepspeed/utils/timer.py' adding 'deepspeed/utils/zero_to_fp32.py' adding 'deepspeed-0.6.2+d8ed3ce4.data/scripts/deepspeed' adding 'deepspeed-0.6.2+d8ed3ce4.data/scripts/deepspeed.pt' adding 'deepspeed-0.6.2+d8ed3ce4.data/scripts/ds' adding 'deepspeed-0.6.2+d8ed3ce4.data/scripts/ds_elastic' adding 'deepspeed-0.6.2+d8ed3ce4.data/scripts/ds_report' adding 'deepspeed-0.6.2+d8ed3ce4.data/scripts/ds_ssh' adding 'deepspeed-0.6.2+d8ed3ce4.dist-info/LICENSE' adding 'deepspeed-0.6.2+d8ed3ce4.dist-info/METADATA' adding 'deepspeed-0.6.2+d8ed3ce4.dist-info/WHEEL' adding 'deepspeed-0.6.2+d8ed3ce4.dist-info/entry_points.txt' adding 'deepspeed-0.6.2+d8ed3ce4.dist-info/top_level.txt' adding 'deepspeed-0.6.2+d8ed3ce4.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel

deepspeed build time = 35.394285678863525 secs

And here the tail of the error message:

[2022-04-08 21:28:07,950] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.2+d8ed3ce4, git-hash=d8ed3ce4, git-branch=master 100%|█████████████████████████████████████████████| 1/1 [00:01<00:00, 1.13s/ba] 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 203.52ba/s] [2022-04-08 21:28:08,257] [INFO] [engine.py:277:init] DeepSpeed Flops Profiler Enabled: False Traceback (most recent call last): Traceback (most recent call last): File "run_clm.py", line 478, in File "run_clm.py", line 478, in main()main()

File "run_clm.py", line 441, in main File "run_clm.py", line 441, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/trainer.py", line 1240, in train train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/trainer.py", line 1240, in train deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(

File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/deepspeed.py", line 424, in deepspeed_init File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/deepspeed.py", line 424, in deepspeed_init deepspeedengine, optimizer, , lr_scheduler = deepspeed.initialize(kwargs)deepspeedengine, optimizer, , lr_scheduler = deepspeed.initialize(kwargs)

File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/init.py", line 119, in initialize File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/init.py", line 119, in initialize engine = DeepSpeedEngine(args=args, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 293, in init engine = DeepSpeedEngine(args=args, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 293, in init self._configure_optimizer(optimizer, model_parameters)self._configure_optimizer(optimizer, model_parameters)

File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1062, in _configure_optimizer File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1062, in _configure_optimizer basic_optimizer = self._configure_basic_optimizer(model_parameters)basic_optimizer = self._configure_basic_optimizer(model_parameters)

File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer optimizer = DeepSpeedCPUAdam(model_parameters,
optimizer = DeepSpeedCPUAdam(model_parameters, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init

File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init self.ds_opt_adam = CPUAdamBuilder().load() File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 461, in load self.ds_opt_adam = CPUAdamBuilder().load() File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 461, in load return importlib.import_module(self.absolute_name())
return importlib.import_module(self.absolute_name()) File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 991, in _find_and_load File "", line 973, in _find_and_load_unlocked File "", line 973, in _find_and_load_unlocked ModuleNotFoundError: No module named 'deepspeed.ops.adam.cpu_adam_op' ModuleNotFoundError: No module named 'deepspeed.ops.adam.cpu_adam_op'

Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fd7ed9bc940> Traceback (most recent call last): File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 97, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f58156c0940> Traceback (most recent call last): File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 97, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' [2022-04-08 21:28:09,337] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 12142 [2022-04-08 21:28:09,338] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 12143 [2022-04-08 21:28:09,338] [ERROR] [launch.py:184:sigkill_handler] ['/home/max/anaconda3/envs/gptneo_finetuned/bin/python3', '-u', 'run_clm.py', '--local_rank=1', '--deepspeed', 'ds_config.json', '--model_name_or_path', 'EleutherAI/gpt-neo-2.7B', '--train_file', 'train.csv', '--validation_file', 'validation.csv', '--do_train', '--do_eval', '--fp16', '--overwrite_cache', '--evaluation_strategy=steps', '--output_dir', 'finetuned2', '--num_train_epochs', '1', '--eval_steps', '15', '--gradient_accumulation_steps', '2', '--per_device_train_batch_size', '4', '--use_fast_tokenizer', 'False', '--learning_rate', '5e-06', '--warmup_steps', '10'] exits with return code = 1

stas00 commented 2 years ago

@maxmaier59,

For full long logs please consider either attaching them as a separate file or making them Collapsible - since otherwise it makes the threads very difficult to navigate as the logs take over the content. e.g. one good recipe is here: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab

Also may be switch to using just one gpu while sorting this out to avoid the interleaved tracebacks which are difficult to read.


OK, so I no longer see the original error of undefined symbol: curandCreateGenerator, which means we are now dealing with another most likely unrelated issue:

File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters, File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 461, in load
self.ds_opt_adam = CPUAdamBuilder().load()
return importlib.import_module(self.absolute_name())
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module
File "", line 1014, in _gcd_import
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 991, in _find_and_load
File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'deepspeed.ops.adam.cpu_adam_op'

I tried to remove the interleaving lines from your log.

So despite successfully pre-building the op, it fails to load deepspeed.ops.adam.cpu_adam_op.

You get this error:

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

because:

self.ds_opt_adam = CPUAdamBuilder().load()

failed. So the latter is the real issue.

so most likely you should be able to reproduce the problem with just:

python -c "from deepspeed.ops.op_builder import CPUAdamBuilder; CPUAdamBuilder().load()"

So from inside the deepspeed source folder, (and I think both you and I are using the -e / devel install), I have:

cd DeepSpeed
ls -l deepspeed/ops/adam/cpu_adam*
-rwxrwxr-x 1 stas stas 10857928 Apr  5 08:41 deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 stas stas     8994 Jan  7 21:02 deepspeed/ops/adam/cpu_adam.py

What do you have? I wonder if there is an issue with file perms, say if you're missing read perms on the files on perhaps on parent dirs.


Ultimately, the problem happens here:

https://github.com/microsoft/DeepSpeed/blob/d8ed3ce445b3d447a113305343d3c21fbf1bf2ba/op_builder/builder.py#L461

and the argument it's passing here is deepspeed.ops.adam.cpu_adam_op which is set here:

https://github.com/microsoft/DeepSpeed/blob/d8ed3ce445b3d447a113305343d3c21fbf1bf2ba/op_builder/cpu_adam.py#L15-L16

and so it is looking for deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so

In case of JIT it'll use this path:

https://github.com/microsoft/DeepSpeed/blob/d8ed3ce445b3d447a113305343d3c21fbf1bf2ba/op_builder/builder.py#L463

which will look for ~/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so instead. (py38_cu113) is system dependent, on your setup it will be py38_cu115.

So you should be able to reproduce the problem with an even more precise:

python -c "import importlib; importlib.import_module('deepspeed.ops.adam.cpu_adam_op')"
stas00 commented 2 years ago

I'm also curious if perhaps you somehow have multiple installs of deepspeed, what do you get from:

ls -l /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed*

because in your OP you had deepspeed installed inside site-packages but later you switched to devel install, so there should be just:

ls -l /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed.egg-link

and no:

ls -l /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed/

but I don't know if you're still using the same conda env as in your OP.

stas00 commented 2 years ago

And since the JIT build worked for you, let's see what you have under ~/.cache/torch_extensions/py38_cu115, i.e:

find ~/.cache/torch_extensions/py38_cu115/cpu_adam/

I wonder if somehow something gets messed up there.

e.g. on my set up I have:

$ cd DeepSpeed
$ ls -l deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 stas stas 11M Apr  5 08:41 deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so

$ ls -l /home/stas/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so
-rwxrwxr-x 1 stas stas 11M Apr  8 13:15 /home/stas/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so*

to get the latter one, I of course needed to do JIT, so I had to do:

pip uninstall deepspeed -y
pip install deepspeed 
python -c "from deepspeed.ops.op_builder import CPUAdamBuilder; CPUAdamBuilder().load()"

and may be let's see the log of the last command, and then we can compare its build to the prebuild log - and perhaps find what's mismatching.

stas00 commented 2 years ago

Let's also check that the shared object library gets all its references resolved, e.g. on my setup:

$ LD_LIBRARY_PATH=/home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib ldd deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffef2b54000)
        libgtk3-nocsd.so.0 => /lib/x86_64-linux-gnu/libgtk3-nocsd.so.0 (0x00007f7d7c1c6000)
        libcurand.so.10 => /home/stas/anaconda3/envs/py38-pt111/lib/libcurand.so.10 (0x00007f7d76a37000)
        libc10.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libc10.so (0x00007f7d767ad000)
        libtorch.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libtorch.so (0x00007f7d765ab000)
        libtorch_cpu.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so (0x00007f7d6badb000)
        libtorch_python.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libtorch_python.so (0x00007f7d6ad81000)
        libcudart.so.11.0 => /home/stas/anaconda3/envs/py38-pt111/lib/libcudart.so.11.0 (0x00007f7d6aae4000)
        libc10_cuda.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libc10_cuda.so (0x00007f7d6a894000)
        libtorch_cuda_cu.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007f7d384b1000)
        libtorch_cuda_cpp.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007f7cf30be000)
        libstdc++.so.6 => /home/stas/anaconda3/envs/py38-pt111/lib/libstdc++.so.6 (0x00007f7cf2f49000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7cf2e65000)
        libgcc_s.so.1 => /home/stas/anaconda3/envs/py38-pt111/lib/libgcc_s.so.1 (0x00007f7cf2e51000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7cf2c29000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7cf2c24000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7cf2c1f000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7cf2c18000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7d7c4d0000)
        libgomp.so.1 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libgomp.so.1 (0x00007f7cf2beb000)
        libtorch_cuda.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so (0x00007f7cf29d7000)
        libmkl_intel_lp64.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libmkl_intel_lp64.so (0x00007f7cf1e38000)
        libmkl_gnu_thread.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libmkl_gnu_thread.so (0x00007f7cf02ab000)
        libmkl_core.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libmkl_core.so (0x00007f7cebe3d000)
        libshm.so => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/libshm.so (0x00007f7cebc36000)
        libnvToolsExt.so.1 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libnvToolsExt.so.1 (0x00007f7ceba2c000)
        libcusparse.so.11 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libcusparse.so.11 (0x00007f7cdd740000)
        libcusolver.so.11 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libcusolver.so.11 (0x00007f7cd0879000)
        libcublas.so.11 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11 (0x00007f7cc9236000)
        libcufft.so.10 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libcufft.so.10 (0x00007f7cbda3b000)
        libcublasLt.so.11 => /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/lib/../../../../libcublasLt.so.11 (0x00007f7cad02b000)

in your case it'd be:

$ LD_LIBRARY_PATH=/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib ldd deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so

we want to make sure we don't get:

libcurand.so.10 => not found

LD_LIBRARY_PATH is there to help ldd find the libtorch shared objects.

stas00 commented 2 years ago

I think I know what the problem is here, you're building a binary wheel package

TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 python setup.py build_ext -j8 bdist_wheel

and you're forgetting to install it perhaps? Instead of the devel build/install which doesn't require installation.

So if you have not installed the pre-build files it makes sense that it fails to work.

Note where your pre-built files go:

copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/deepspeed/ops/adam

Note that on my setup the same command is:

copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops/adam

which is exactly what's missing on your side.

So you have 2 ways to fix this:

  1. install the wheel you have built, that is:
    pip install dist/deepspeed*.whl

    (check if you have many wheels there, i have no idea how yours was built or named, hence the *)

it'd be interesting to see if this works. I have never tried this path, though I see no reason why it won't work.

  1. change your build instructions not to build a wheel, e.g. I use:
$ cat build.sh
#!/bin/bash

rm -rf build

time TORCH_CUDA_ARCH_LIST="6.1;8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1  DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1 | tee build.log

of course adjust what you need to adjust (TORCH_CUDA_ARCH_LIST certainly)

and then once you run the script, you can instantly use this deepspeed prebuild.

If this is the culprit we will then need to tag the maintainers for them to do extra work to detect this half-baked use case and properly tell the user what the problem is.

stas00 commented 2 years ago

cuDNN version: Could not collect

Also make sure to install the latest cudnn - your cuda programs will run much faster, see: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#overview

maxmaier59 commented 2 years ago

Checking out option 2 (use build instruction without building a wheel) works! Many thanks for your patience with me and also for the hints. :-)

stas00 commented 2 years ago

So glad to hear it finally worked, @maxmaier59!

Does it mean that you tried installing the binary wheel and it didn't work?

djaym7 commented 1 year ago

current main branch 0.8.2+4ae3a3da gives the same error. pip installing 0.8.1 works fine

Misoknisky commented 1 year ago

And since the JIT build worked for you, let's see what you have under , i.e:~/.cache/torch_extensions/py38_cu115

find ~/.cache/torch_extensions/py38_cu115/cpu_adam/

I wonder if somehow something gets messed up there.

e.g. on my set up I have:

$ cd DeepSpeed
$ ls -l deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 stas stas 11M Apr  5 08:41 deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so

$ ls -l /home/stas/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so
-rwxrwxr-x 1 stas stas 11M Apr  8 13:15 /home/stas/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so*

to get the latter one, I of course needed to do JIT, so I had to do:

pip uninstall deepspeed -y
pip install deepspeed 
python -c "from deepspeed.ops.op_builder import CPUAdamBuilder; CPUAdamBuilder().load()"

and may be let's see the log of the last command, and then we can compare its build to the prebuild log - and perhaps find what's mismatching.

Sorry, I also have this problem, I execute this command, the outputs as follow, but l aso have the proble "AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'", so how to fix it ?

Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 2.4532833099365234 seconds

BramVanroy commented 1 year ago

For the record, I did not have this issue on 0.10.0 but when I upgraded to the current main (777ae39a85988da3e934c272842f6c65686b8896), I got the issue as well. Pre-building the cpu adam solved the issue but regardless, it seems an important issue to raise.

flckv commented 1 year ago

hi @BramVanroy how did you Pre-building the cpu adam?

BramVanroy commented 1 year ago

hi @BramVanroy how did you Pre-building the cpu adam?

You find the instructions here https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops