hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.78k stars 4.34k forks source link

[BUG]: 按照官方教程无法完成安装 #2896

Closed zixiliuUSC closed 1 year ago

zixiliuUSC commented 1 year ago

🐛 Describe the bug

参考官方教程运行:CUDA_EXT=1 pip install colossalai。运行环境:conda虚拟环境,python=3.9.13,pytorch=1.13+cuda11.6,显卡Nvidia A30*3。报错如下:

Processing /home/liuzixi01/ColossalAI
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: numpy in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (1.24.2)
Requirement already satisfied: tqdm in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (4.64.1)
Requirement already satisfied: psutil in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (5.9.4)
Requirement already satisfied: packaging in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (23.0)
Requirement already satisfied: pre-commit in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (3.1.0)
Requirement already satisfied: rich in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (13.3.1)
Requirement already satisfied: click in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (8.1.3)
Requirement already satisfied: fabric in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (3.0.0)
Requirement already satisfied: contexttimer in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (0.3.3)
Requirement already satisfied: ninja in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (1.11.1)
Requirement already satisfied: torch in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from colossalai==0.2.5) (1.13.1+cu116)
Requirement already satisfied: invoke>=2.0 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from fabric->colossalai==0.2.5) (2.0.0)
Requirement already satisfied: paramiko>=2.4 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from fabric->colossalai==0.2.5) (3.0.0)
Requirement already satisfied: identify>=1.0.0 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from pre-commit->colossalai==0.2.5) (2.5.18)
Requirement already satisfied: virtualenv>=20.10.0 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from pre-commit->colossalai==0.2.5) (20.19.0)
Requirement already satisfied: cfgv>=2.0.0 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from pre-commit->colossalai==0.2.5) (3.3.1)
Requirement already satisfied: nodeenv>=0.11.1 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from pre-commit->colossalai==0.2.5) (1.7.0)
Requirement already satisfied: pyyaml>=5.1 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from pre-commit->colossalai==0.2.5) (6.0)
Requirement already satisfied: pygments<3.0.0,>=2.14.0 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from rich->colossalai==0.2.5) (2.14.0)
Requirement already satisfied: markdown-it-py<3.0.0,>=2.1.0 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from rich->colossalai==0.2.5) (2.2.0)
Requirement already satisfied: typing-extensions in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from torch->colossalai==0.2.5) (4.5.0)
Requirement already satisfied: mdurl~=0.1 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from markdown-it-py<3.0.0,>=2.1.0->rich->colossalai==0.2.5) (0.1.2)
Requirement already satisfied: setuptools in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from nodeenv>=0.11.1->pre-commit->colossalai==0.2.5) (65.5.1)
Requirement already satisfied: cryptography>=3.3 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from paramiko>=2.4->fabric->colossalai==0.2.5) (39.0.1)
Requirement already satisfied: bcrypt>=3.2 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from paramiko>=2.4->fabric->colossalai==0.2.5) (4.0.1)
Requirement already satisfied: pynacl>=1.5 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from paramiko>=2.4->fabric->colossalai==0.2.5) (1.5.0)
Requirement already satisfied: distlib<1,>=0.3.6 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from virtualenv>=20.10.0->pre-commit->colossalai==0.2.5) (0.3.6)
Requirement already satisfied: filelock<4,>=3.4.1 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from virtualenv>=20.10.0->pre-commit->colossalai==0.2.5) (3.9.0)
Requirement already satisfied: platformdirs<4,>=2.4 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from virtualenv>=20.10.0->pre-commit->colossalai==0.2.5) (3.0.0)
Requirement already satisfied: cffi>=1.12 in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from cryptography>=3.3->paramiko>=2.4->fabric->colossalai==0.2.5) (1.15.1)
Requirement already satisfied: pycparser in /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages (from cffi>=1.12->cryptography>=3.3->paramiko>=2.4->fabric->colossalai==0.2.5) (2.21)
Building wheels for collected packages: colossalai
  Building wheel for colossalai (setup.py): started
  Building wheel for colossalai (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [127 lines of output]

      torch.__version__  = 1.13.1+cu116

      Compiling cuda extensions with
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2020 NVIDIA Corporation
      Built on Wed_Jul_22_19:09:09_PDT_2020
      Cuda compilation tools, release 11.0, V11.0.221
      Build cuda_11.0_bu.TC445_37.28845127_0
      from /usr/local/cuda/bin

      Warning: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 11.6.
      In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.
      ===== Building Extension cpu_adam =====
      ===== Building Extension fused_optim =====
      ===== Building Extension moe =====
      ===== Building Extension multi_head_attn =====
      ===== Building Extension scaled_masked_softmax =====
      ===== Building Extension scaled_upper_triangle_masked_softmax =====
      ===== Building Extension layernorm =====
      running bdist_wheel
      running build
      running build_py
      copying colossalai/version.py -> build/lib.linux-x86_64-cpython-39/colossalai
      running build_ext
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py:387: UserWarning: The detected CUDA version (11.0) has a minor version mismatch with the version that was used to compile PyTorch (11.6). Most likely this shouldn't be a problem.
        warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py:397: UserWarning: There are no g++ version bounds defined for CUDA version 11.0
        warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
      building 'colossalai._C.cpu_adam' extension
      Emitting ninja build file /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      ninja: no work to do.
      g++ -pthread -B /home/liuzixi01/.conda/envs/torch-cuda116/compiler_compat -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/liuzixi01/.conda/envs/torch-cuda116/lib -Wl,-rpath-link,/home/liuzixi01/.conda/envs/torch-cuda116/lib -L/home/liuzixi01/.conda/envs/torch-cuda116/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/liuzixi01/.conda/envs/torch-cuda116/lib -Wl,-rpath-link,/home/liuzixi01/.conda/envs/torch-cuda116/lib -L/home/liuzixi01/.conda/envs/torch-cuda116/lib /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.o -L/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-39/colossalai/_C/cpu_adam.cpython-39-x86_64-linux-gnu.so
      building 'colossalai._C.fused_optim' extension
      Emitting ninja build file /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/5] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [2/5] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [3/5] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [4/5] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [5/5] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
          subprocess.run(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/subprocess.py", line 528, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/liuzixi01/ColossalAI/setup.py", line 170, in <module>
          setup(name=package_name,
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 325, in run
          self.run_command("build")
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
          self.build_extensions()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
          build_ext.build_extensions(self)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for colossalai
  Running setup.py clean for colossalai
Failed to build colossalai
Installing collected packages: colossalai
  Running setup.py install for colossalai: started
  Running setup.py install for colossalai: still running...
  Running setup.py install for colossalai: finished with status 'error'
  error: subprocess-exited-with-error

  × Running setup.py install for colossalai did not run successfully.
  │ exit code: 1
  ╰─> [916 lines of output]

      torch.__version__  = 1.13.1+cu116

      Compiling cuda extensions with
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2020 NVIDIA Corporation
      Built on Wed_Jul_22_19:09:09_PDT_2020
      Cuda compilation tools, release 11.0, V11.0.221
      Build cuda_11.0_bu.TC445_37.28845127_0
      from /usr/local/cuda/bin

      Warning: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 11.6.
      In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.
      ===== Building Extension cpu_adam =====
      ===== Building Extension fused_optim =====
      ===== Building Extension moe =====
      ===== Building Extension multi_head_attn =====
      ===== Building Extension scaled_masked_softmax =====
      ===== Building Extension scaled_upper_triangle_masked_softmax =====
      ===== Building Extension layernorm =====
      running install
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-39
      creating build/lib.linux-x86_64-cpython-39/colossalai
      copying colossalai/version.py -> build/lib.linux-x86_64-cpython-39/colossalai
      copying colossalai/core.py -> build/lib.linux-x86_64-cpython-39/colossalai
      copying colossalai/global_variables.py -> build/lib.linux-x86_64-cpython-39/colossalai
      copying colossalai/initialize.py -> build/lib.linux-x86_64-cpython-39/colossalai
      copying colossalai/constants.py -> build/lib.linux-x86_64-cpython-39/colossalai
      copying colossalai/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai
      creating build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/scaled_masked_softmax.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/multi_head_attn.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/utils.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/builder.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/fused_optim.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/layernorm.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/moe.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/scaled_upper_triangle_masked_softmax.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/__init__.py -> build/lib.linux-x86_64-cpython-39/op_builder
      copying op_builder/cpu_adam.py -> build/lib.linux-x86_64-cpython-39/op_builder
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn
      copying colossalai/nn/init.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn
      copying colossalai/nn/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx
      copying colossalai/fx/_compatibility.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx
      copying colossalai/fx/graph_module.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx
      copying colossalai/fx/_meta_registrations.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx
      copying colossalai/fx/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx
      copying colossalai/fx/proxy.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx
      creating build/lib.linux-x86_64-cpython-39/colossalai/builder
      copying colossalai/builder/builder.py -> build/lib.linux-x86_64-cpython-39/colossalai/builder
      copying colossalai/builder/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/builder
      creating build/lib.linux-x86_64-cpython-39/colossalai/cli
      copying colossalai/cli/cli.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli
      copying colossalai/cli/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli
      creating build/lib.linux-x86_64-cpython-39/colossalai/_C
      copying colossalai/_C/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/_C
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel
      copying colossalai/auto_parallel/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/memory.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/checkpointing.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/cuda.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/moe.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/activation_checkpoint.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/common.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      copying colossalai/utils/timer.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero
      copying colossalai/zero/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero
      creating build/lib.linux-x86_64-cpython-39/colossalai/amp
      copying colossalai/amp/amp_type.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp
      copying colossalai/amp/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp
      creating build/lib.linux-x86_64-cpython-39/colossalai/device
      copying colossalai/device/calc_pipeline_strategy.py -> build/lib.linux-x86_64-cpython-39/colossalai/device
      copying colossalai/device/device_mesh.py -> build/lib.linux-x86_64-cpython-39/colossalai/device
      copying colossalai/device/alpha_beta_profiler.py -> build/lib.linux-x86_64-cpython-39/colossalai/device
      copying colossalai/device/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/device
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel
      copying colossalai/kernel/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel
      creating build/lib.linux-x86_64-cpython-39/colossalai/communication
      copying colossalai/communication/collective.py -> build/lib.linux-x86_64-cpython-39/colossalai/communication
      copying colossalai/communication/p2p_v2.py -> build/lib.linux-x86_64-cpython-39/colossalai/communication
      copying colossalai/communication/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/communication
      copying colossalai/communication/p2p.py -> build/lib.linux-x86_64-cpython-39/colossalai/communication
      copying colossalai/communication/ring.py -> build/lib.linux-x86_64-cpython-39/colossalai/communication
      copying colossalai/communication/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/communication
      creating build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/const.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/sharding_spec.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/compute_spec.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/distspec.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/op_wrapper.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/process_group.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/param_op_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/colo_tensor.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/shape_consistency.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/colo_parameter.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/tensor_spec.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/comm_spec.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/dist_spec_mgr.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      copying colossalai/tensor/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/tensor
      creating build/lib.linux-x86_64-cpython-39/colossalai/pipeline
      copying colossalai/pipeline/layer_spec.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline
      copying colossalai/pipeline/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline
      copying colossalai/pipeline/pipelinable.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline
      copying colossalai/pipeline/pipeline_process_group.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline
      copying colossalai/pipeline/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline
      creating build/lib.linux-x86_64-cpython-39/colossalai/logging
      copying colossalai/logging/logger.py -> build/lib.linux-x86_64-cpython-39/colossalai/logging
      copying colossalai/logging/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/logging
      creating build/lib.linux-x86_64-cpython-39/colossalai/registry
      copying colossalai/registry/registry.py -> build/lib.linux-x86_64-cpython-39/colossalai/registry
      copying colossalai/registry/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/registry
      creating build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/gemini_mgr.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/tensor_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/placement_policy.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/tensor_placement_policy.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/stateful_tensor_mgr.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/gemini_context.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/stateful_tensor.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      copying colossalai/gemini/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini
      creating build/lib.linux-x86_64-cpython-39/colossalai/context
      copying colossalai/context/config.py -> build/lib.linux-x86_64-cpython-39/colossalai/context
      copying colossalai/context/singleton_meta.py -> build/lib.linux-x86_64-cpython-39/colossalai/context
      copying colossalai/context/parallel_context.py -> build/lib.linux-x86_64-cpython-39/colossalai/context
      copying colossalai/context/moe_context.py -> build/lib.linux-x86_64-cpython-39/colossalai/context
      copying colossalai/context/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/context
      copying colossalai/context/parallel_mode.py -> build/lib.linux-x86_64-cpython-39/colossalai/context
      creating build/lib.linux-x86_64-cpython-39/colossalai/engine
      copying colossalai/engine/_base_engine.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine
      copying colossalai/engine/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine
      creating build/lib.linux-x86_64-cpython-39/colossalai/testing
      copying colossalai/testing/random.py -> build/lib.linux-x86_64-cpython-39/colossalai/testing
      copying colossalai/testing/comparison.py -> build/lib.linux-x86_64-cpython-39/colossalai/testing
      copying colossalai/testing/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/testing
      copying colossalai/testing/pytest_wrapper.py -> build/lib.linux-x86_64-cpython-39/colossalai/testing
      copying colossalai/testing/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/testing
      creating build/lib.linux-x86_64-cpython-39/colossalai/trainer
      copying colossalai/trainer/_trainer.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer
      copying colossalai/trainer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/delayed.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/poly.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/multistep.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/torch.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/onecycle.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      copying colossalai/nn/lr_scheduler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/lr_scheduler
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/metric
      copying colossalai/nn/metric/accuracy_2p5d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/metric
      copying colossalai/nn/metric/accuracy_3d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/metric
      copying colossalai/nn/metric/accuracy_2d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/metric
      copying colossalai/nn/metric/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/metric
      copying colossalai/nn/metric/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/metric
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/addmm.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/loss.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/layernorm.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/view.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/batch_norm.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/element_wise.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      copying colossalai/nn/_ops/embedding_bag.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/_ops
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/lamb.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/fused_sgd.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/lars.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/fused_lamb.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/hybrid_adam.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/zero_optimizer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/fused_adam.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/nvme_optimizer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/colossalai_optimizer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/gemini_optimizer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      copying colossalai/nn/optimizer/cpu_adam.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/optimizer
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer
      copying colossalai/nn/layer/base_layer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer
      copying colossalai/nn/layer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      copying colossalai/nn/parallel/data_parallel.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      copying colossalai/nn/parallel/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      copying colossalai/nn/parallel/gemini_parallel.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      copying colossalai/nn/parallel/zero_wrapper.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      copying colossalai/nn/parallel/reducer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      copying colossalai/nn/parallel/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      copying colossalai/nn/loss/loss_3d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      copying colossalai/nn/loss/loss_2d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      copying colossalai/nn/loss/loss_2p5d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      copying colossalai/nn/loss/loss_moe.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      copying colossalai/nn/loss/loss_1d.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      copying colossalai/nn/loss/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/loss
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_3d
      copying colossalai/nn/layer/parallel_3d/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_3d
      copying colossalai/nn/layer/parallel_3d/_operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_3d
      copying colossalai/nn/layer/parallel_3d/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_3d
      copying colossalai/nn/layer/parallel_3d/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_3d
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_1d
      copying colossalai/nn/layer/parallel_1d/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_1d
      copying colossalai/nn/layer/parallel_1d/_operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_1d
      copying colossalai/nn/layer/parallel_1d/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_1d
      copying colossalai/nn/layer/parallel_1d/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_1d
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2p5d
      copying colossalai/nn/layer/parallel_2p5d/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2p5d
      copying colossalai/nn/layer/parallel_2p5d/_operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2p5d
      copying colossalai/nn/layer/parallel_2p5d/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2p5d
      copying colossalai/nn/layer/parallel_2p5d/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2p5d
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/wrapper
      copying colossalai/nn/layer/wrapper/pipeline_wrapper.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/wrapper
      copying colossalai/nn/layer/wrapper/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/wrapper
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_sequence
      copying colossalai/nn/layer/parallel_sequence/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_sequence
      copying colossalai/nn/layer/parallel_sequence/_operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_sequence
      copying colossalai/nn/layer/parallel_sequence/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_sequence
      copying colossalai/nn/layer/parallel_sequence/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_sequence
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2d
      copying colossalai/nn/layer/parallel_2d/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2d
      copying colossalai/nn/layer/parallel_2d/_operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2d
      copying colossalai/nn/layer/parallel_2d/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2d
      copying colossalai/nn/layer/parallel_2d/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/parallel_2d
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/utils
      copying colossalai/nn/layer/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/utils
      copying colossalai/nn/layer/utils/common.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/utils
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      copying colossalai/nn/layer/moe/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      copying colossalai/nn/layer/moe/experts.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      copying colossalai/nn/layer/moe/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      copying colossalai/nn/layer/moe/_operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      copying colossalai/nn/layer/moe/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      copying colossalai/nn/layer/moe/routers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/moe
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/vanilla
      copying colossalai/nn/layer/vanilla/layers.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/vanilla
      copying colossalai/nn/layer/vanilla/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/vanilla
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      copying colossalai/nn/layer/colossalai_layer/dropout.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      copying colossalai/nn/layer/colossalai_layer/normalization.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      copying colossalai/nn/layer/colossalai_layer/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      copying colossalai/nn/layer/colossalai_layer/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      copying colossalai/nn/layer/colossalai_layer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      copying colossalai/nn/layer/colossalai_layer/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/layer/colossalai_layer
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers
      copying colossalai/nn/parallel/layers/colo_module.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers
      copying colossalai/nn/parallel/layers/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers
      copying colossalai/nn/parallel/layers/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers
      copying colossalai/nn/parallel/layers/module_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers
      copying colossalai/nn/parallel/layers/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers
      creating build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/parallel_cached_embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/base_embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/embedding_config.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/cache_mgr.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise_split_cache.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/copyer.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      copying colossalai/nn/parallel/layers/cache_embedding/cached_embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/nn/parallel/layers/cache_embedding
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/shard_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/tensor.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/dataflow.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/opcount.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/memory_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/constants.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      copying colossalai/fx/profiler/profiler.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/shard_1d_pass.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/passes_for_gpt2_test.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/adding_split_node_pass.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/concrete_info_prop.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/split_module.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      copying colossalai/fx/passes/meta_info_prop.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/codegen
      copying colossalai/fx/codegen/activation_checkpoint_codegen.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/codegen
      copying colossalai/fx/codegen/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/codegen
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/_tracer_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/tracer.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/_meta_trace.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/experimental.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/registry.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/_symbolic_trace.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      copying colossalai/fx/tracer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental
      copying colossalai/fx/profiler/experimental/shard_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental
      copying colossalai/fx/profiler/experimental/registry.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental
      copying colossalai/fx/profiler/experimental/constants.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental
      copying colossalai/fx/profiler/experimental/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental
      copying colossalai/fx/profiler/experimental/profiler.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/python_ops.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/torch_ops.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/normalization.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/activation_function.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/arithmetic.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      copying colossalai/fx/profiler/experimental/profiler_function/pooling.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_function
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/attention.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/dropout.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/rnn.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/convolution.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/normalization.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/activation_function.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/torch_op.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      copying colossalai/fx/profiler/experimental/profiler_module/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module

bug太长写不下,下面评论补充

Environment

conda虚拟环境,python=3.9.13,pytorch=1.13+cuda11.6,显卡Nvidia A30*3

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [BUG]: The installation cannot be completed according to the official tutorial

zixiliuUSC commented 1 year ago
      copying colossalai/fx/profiler/experimental/profiler_module/pooling.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/profiler/experimental/profiler_module
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/ckpt_solver_chen.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/linearize.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/build_c_ext.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/ckpt_solver_pofo.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      copying colossalai/fx/passes/algorithms/ckpt_solver_rotor.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/passes/algorithms
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch
      copying colossalai/fx/tracer/bias_addition_patch/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch
      copying colossalai/fx/tracer/meta_patch/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/bias_addition_module.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/conv.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/addmm.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/bias_addition_function.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/addbmm.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function
      copying colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/rnn.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/convolution.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/normalization.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/activation_function.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      copying colossalai/fx/tracer/meta_patch/patched_module/pooling.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_module
      creating build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/python_ops.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/torch_ops.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/convolution.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/normalization.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/activation_function.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/arithmetic.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      copying colossalai/fx/tracer/meta_patch/patched_function/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/fx/tracer/meta_patch/patched_function
      creating build/lib.linux-x86_64-cpython-39/colossalai/cli/launcher
      copying colossalai/cli/launcher/multinode_runner.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/launcher
      copying colossalai/cli/launcher/hostinfo.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/launcher
      copying colossalai/cli/launcher/run.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/launcher
      copying colossalai/cli/launcher/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/launcher
      creating build/lib.linux-x86_64-cpython-39/colossalai/cli/check
      copying colossalai/cli/check/check_installation.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/check
      copying colossalai/cli/check/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/check
      creating build/lib.linux-x86_64-cpython-39/colossalai/cli/benchmark
      copying colossalai/cli/benchmark/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/benchmark
      copying colossalai/cli/benchmark/models.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/benchmark
      copying colossalai/cli/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/benchmark
      copying colossalai/cli/benchmark/benchmark.py -> build/lib.linux-x86_64-cpython-39/colossalai/cli/benchmark
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard
      copying colossalai/auto_parallel/tensor_shard/options.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard
      copying colossalai/auto_parallel/tensor_shard/sharding_strategy.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard
      copying colossalai/auto_parallel/tensor_shard/initialize.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard
      copying colossalai/auto_parallel/tensor_shard/constants.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard
      copying colossalai/auto_parallel/tensor_shard/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      copying colossalai/auto_parallel/passes/runtime_preparation_pass.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      copying colossalai/auto_parallel/passes/runtime_apply_pass.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      copying colossalai/auto_parallel/passes/comm_metainfo_pass.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      copying colossalai/auto_parallel/passes/constants.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      copying colossalai/auto_parallel/passes/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      copying colossalai/auto_parallel/passes/meta_info_prop.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/passes
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      copying colossalai/auto_parallel/checkpoint/ckpt_solver_chen.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      copying colossalai/auto_parallel/checkpoint/build_c_ext.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      copying colossalai/auto_parallel/checkpoint/ckpt_solver_base.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      copying colossalai/auto_parallel/checkpoint/operation.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      copying colossalai/auto_parallel/checkpoint/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      copying colossalai/auto_parallel/checkpoint/ckpt_solver_rotor.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/checkpoint
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler
      copying colossalai/auto_parallel/meta_profiler/metainfo.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler
      copying colossalai/auto_parallel/meta_profiler/registry.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler
      copying colossalai/auto_parallel/meta_profiler/constants.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler
      copying colossalai/auto_parallel/meta_profiler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/pipeline_shard
      copying colossalai/auto_parallel/pipeline_shard/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/pipeline_shard
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      copying colossalai/auto_parallel/tensor_shard/utils/factory.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      copying colossalai/auto_parallel/tensor_shard/utils/misc.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      copying colossalai/auto_parallel/tensor_shard/utils/reshape.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      copying colossalai/auto_parallel/tensor_shard/utils/broadcast.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      copying colossalai/auto_parallel/tensor_shard/utils/sharding.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      copying colossalai/auto_parallel/tensor_shard/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/utils
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/permute_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/addmm_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/default_reshape_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/unary_elementwise_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/layer_norm_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/where_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/binary_elementwise_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/normal_pooling_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/output_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/registry.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/tensor_constructor_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/softmax_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/view_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/transpose_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/batch_norm_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/sum_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/placeholder_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/split_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/bmm_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/conv_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      copying colossalai/auto_parallel/tensor_shard/node_handler/getattr_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/solver
      copying colossalai/auto_parallel/tensor_shard/solver/cost_graph.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/solver
      copying colossalai/auto_parallel/tensor_shard/solver/solver.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/solver
      copying colossalai/auto_parallel/tensor_shard/solver/strategies_constructor.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/solver
      copying colossalai/auto_parallel/tensor_shard/solver/graph_analysis.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/solver
      copying colossalai/auto_parallel/tensor_shard/solver/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/solver
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/softmax_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/unary_elementwise_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/sum_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/output_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/getitem_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/embedding_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/tensor_constructor_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/reshape_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/placeholder_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/where_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      copying colossalai/auto_parallel/tensor_shard/node_handler/strategy/getattr_generator.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/tensor_shard/node_handler/strategy
      creating build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/non_spmd.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/where.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/activation.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/binary_elementwise_ops.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/tensor.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/norm.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/embedding.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/linear.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/conv.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      copying colossalai/auto_parallel/meta_profiler/meta_registry/pooling.py -> build/lib.linux-x86_64-cpython-39/colossalai/auto_parallel/meta_profiler/meta_registry
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler
      copying colossalai/utils/profiler/extention.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler
      copying colossalai/utils/profiler/stateful_tensor_mem_extention.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler
      copying colossalai/utils/profiler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler
      copying colossalai/utils/profiler/profiler.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint
      copying colossalai/utils/checkpoint/module_checkpoint.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint
      copying colossalai/utils/checkpoint/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint
      copying colossalai/utils/checkpoint/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/io.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/meta.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/convertor.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/backend.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/writer.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/reader.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/constant.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/distributed.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      copying colossalai/utils/checkpoint_io/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/checkpoint_io
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/multi_tensor_apply
      copying colossalai/utils/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/multi_tensor_apply
      copying colossalai/utils/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/multi_tensor_apply
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/rank_recorder
      copying colossalai/utils/rank_recorder/rank_recorder.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/rank_recorder
      copying colossalai/utils/rank_recorder/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/rank_recorder
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/model
      copying colossalai/utils/model/lazy_init_context.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/model
      copying colossalai/utils/model/experimental.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/model
      copying colossalai/utils/model/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/model
      copying colossalai/utils/model/colo_init_context.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/model
      copying colossalai/utils/model/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/model
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/data_sampler
      copying colossalai/utils/data_sampler/data_parallel_sampler.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/data_sampler
      copying colossalai/utils/data_sampler/base_sampler.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/data_sampler
      copying colossalai/utils/data_sampler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/data_sampler
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/tensor_detector
      copying colossalai/utils/tensor_detector/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/tensor_detector
      copying colossalai/utils/tensor_detector/tensor_detector.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/tensor_detector
      creating build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler/legacy
      copying colossalai/utils/profiler/legacy/comm_profiler.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler/legacy
      copying colossalai/utils/profiler/legacy/pcie_profiler.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler/legacy
      copying colossalai/utils/profiler/legacy/prof_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler/legacy
      copying colossalai/utils/profiler/legacy/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/utils/profiler/legacy
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim
      copying colossalai/zero/sharded_optim/sharded_optim_v2.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim
      copying colossalai/zero/sharded_optim/low_level_optim.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim
      copying colossalai/zero/sharded_optim/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim
      copying colossalai/zero/sharded_optim/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/utils
      copying colossalai/zero/utils/gemini_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/utils
      copying colossalai/zero/utils/zero_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/utils
      copying colossalai/zero/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/utils
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_model
      copying colossalai/zero/sharded_model/reduce_scatter.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_model
      copying colossalai/zero/sharded_model/sharded_model_v2.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_model
      copying colossalai/zero/sharded_model/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_model
      copying colossalai/zero/sharded_model/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_model
      copying colossalai/zero/sharded_model/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_model
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_param
      copying colossalai/zero/sharded_param/sharded_param.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_param
      copying colossalai/zero/sharded_param/sharded_tensor.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_param
      copying colossalai/zero/sharded_param/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_param
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/init_ctx
      copying colossalai/zero/init_ctx/init_context.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/init_ctx
      copying colossalai/zero/init_ctx/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/init_ctx
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/shard_utils
      copying colossalai/zero/shard_utils/base_shard_strategy.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/shard_utils
      copying colossalai/zero/shard_utils/commons.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/shard_utils
      copying colossalai/zero/shard_utils/bucket_tensor_shard_strategy.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/shard_utils
      copying colossalai/zero/shard_utils/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/shard_utils
      copying colossalai/zero/shard_utils/tensor_shard_strategy.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/shard_utils
      creating build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      copying colossalai/zero/sharded_optim/bookkeeping/parameter_store.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      copying colossalai/zero/sharded_optim/bookkeeping/gradient_store.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      copying colossalai/zero/sharded_optim/bookkeeping/base_store.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      copying colossalai/zero/sharded_optim/bookkeeping/bucket_store.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      copying colossalai/zero/sharded_optim/bookkeeping/tensor_bucket.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      copying colossalai/zero/sharded_optim/bookkeeping/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/zero/sharded_optim/bookkeeping
      creating build/lib.linux-x86_64-cpython-39/colossalai/amp/torch_amp
      copying colossalai/amp/torch_amp/_grad_scaler.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/torch_amp
      copying colossalai/amp/torch_amp/torch_amp.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/torch_amp
      copying colossalai/amp/torch_amp/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/torch_amp
      creating build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp
      copying colossalai/amp/naive_amp/naive_amp.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp
      copying colossalai/amp/naive_amp/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp
      copying colossalai/amp/naive_amp/_fp16_optimizer.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp
      copying colossalai/amp/naive_amp/_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp
      creating build/lib.linux-x86_64-cpython-39/colossalai/amp/apex_amp
      copying colossalai/amp/apex_amp/apex_amp.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/apex_amp
      copying colossalai/amp/apex_amp/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/apex_amp
      creating build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp/grad_scaler
      copying colossalai/amp/naive_amp/grad_scaler/constant_grad_scaler.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp/grad_scaler
      copying colossalai/amp/naive_amp/grad_scaler/base_grad_scaler.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp/grad_scaler
      copying colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp/grad_scaler
      copying colossalai/amp/naive_amp/grad_scaler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/amp/naive_amp/grad_scaler
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel/jit
      copying colossalai/kernel/jit/option.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/jit
      copying colossalai/kernel/jit/bias_gelu.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/jit
      copying colossalai/kernel/jit/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/jit
      copying colossalai/kernel/jit/bias_dropout_add.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/jit
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/scaled_masked_softmax.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/multi_head_attn.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/builder.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/fused_optim.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/layernorm.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/moe.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/scaled_upper_triangle_masked_softmax.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      copying colossalai/kernel/op_builder/cpu_adam.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/op_builder
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native
      copying colossalai/kernel/cuda_native/multihead_attention.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native
      copying colossalai/kernel/cuda_native/flash_attention.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native
      copying colossalai/kernel/cuda_native/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native
      copying colossalai/kernel/cuda_native/layer_norm.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native
      copying colossalai/kernel/cuda_native/scaled_softmax.py -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native
      creating build/lib.linux-x86_64-cpython-39/colossalai/pipeline/middleware
      copying colossalai/pipeline/middleware/topo.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/middleware
      copying colossalai/pipeline/middleware/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/middleware
      creating build/lib.linux-x86_64-cpython-39/colossalai/pipeline/rpc
      copying colossalai/pipeline/rpc/_pipeline_base.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/rpc
      copying colossalai/pipeline/rpc/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/rpc
      copying colossalai/pipeline/rpc/_pipeline_schedule.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/rpc
      copying colossalai/pipeline/rpc/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/rpc
      creating build/lib.linux-x86_64-cpython-39/colossalai/pipeline/middleware/adaptor
      copying colossalai/pipeline/middleware/adaptor/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/middleware/adaptor
      copying colossalai/pipeline/middleware/adaptor/fx.py -> build/lib.linux-x86_64-cpython-39/colossalai/pipeline/middleware/adaptor
      creating build/lib.linux-x86_64-cpython-39/colossalai/gemini/paramhooks
      copying colossalai/gemini/paramhooks/_param_hookmgr.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/paramhooks
      copying colossalai/gemini/paramhooks/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/paramhooks
      creating build/lib.linux-x86_64-cpython-39/colossalai/gemini/chunk
      copying colossalai/gemini/chunk/chunk.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/chunk
      copying colossalai/gemini/chunk/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/chunk
      copying colossalai/gemini/chunk/manager.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/chunk
      copying colossalai/gemini/chunk/search_utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/chunk
      copying colossalai/gemini/chunk/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/chunk
      creating build/lib.linux-x86_64-cpython-39/colossalai/gemini/ophooks
      copying colossalai/gemini/ophooks/_shard_grad_ophook.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/ophooks
      copying colossalai/gemini/ophooks/_shard_param_ophook.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/ophooks
      copying colossalai/gemini/ophooks/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/ophooks
      copying colossalai/gemini/ophooks/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/ophooks
      copying colossalai/gemini/ophooks/runtime_mem_tracer_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/ophooks
      creating build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/param_runtime_order.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/static_memstats_collector.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/memstats_collector.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/runtime_mem_tracer.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/chunk_memstats_collector.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/memory_monitor.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      copying colossalai/gemini/memory_tracer/memory_stats.py -> build/lib.linux-x86_64-cpython-39/colossalai/gemini/memory_tracer
      creating build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_1d.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_3d.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_data.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_model.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_sequence.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/process_group_initializer.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_2p5d.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_2d.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_tensor.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/initializer_pipeline.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      copying colossalai/context/process_group_initializer/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/process_group_initializer
      creating build/lib.linux-x86_64-cpython-39/colossalai/context/random
      copying colossalai/context/random/seed_manager.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/random
      copying colossalai/context/random/_helper.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/random
      copying colossalai/context/random/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/context/random
      creating build/lib.linux-x86_64-cpython-39/colossalai/engine/schedule
      copying colossalai/engine/schedule/_non_pipeline_schedule.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/schedule
      copying colossalai/engine/schedule/_pipeline_schedule_v2.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/schedule
      copying colossalai/engine/schedule/_pipeline_schedule.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/schedule
      copying colossalai/engine/schedule/_base_schedule.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/schedule
      copying colossalai/engine/schedule/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/schedule
      creating build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_accumulation
      copying colossalai/engine/gradient_accumulation/_gradient_accumulation.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_accumulation
      copying colossalai/engine/gradient_accumulation/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_accumulation
      creating build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/_moe_gradient_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/utils.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/_pipeline_parallel_gradient_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/_data_parallel_gradient_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/_zero_gradient_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/_sequence_parallel_gradient_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/_base_gradient_handler.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      copying colossalai/engine/gradient_handler/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/engine/gradient_handler
      creating build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/_checkpoint_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/_commons_.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/_lr_scheduler_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/_base_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/_metric_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/__init__.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      copying colossalai/trainer/hooks/_log_hook.py -> build/lib.linux-x86_64-cpython-39/colossalai/trainer/hooks
      creating build/lib.linux-x86_64-cpython-39/tests
      creating build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/beit.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/resnet.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/bert.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/registry.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/simple_net.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/repeated_computed_layers.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/hanging_param_model.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/albert.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/inline_op_model.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/nested_model.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      copying tests/components_to_test/gpt2.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test
      creating build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel
      copying tests/test_auto_parallel/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel
      creating build/lib.linux-x86_64-cpython-39/tests/components_to_test/utils
      copying tests/components_to_test/utils/dummy_data_generator.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test/utils
      copying tests/components_to_test/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test/utils
      copying tests/components_to_test/utils/executor.py -> build/lib.linux-x86_64-cpython-39/tests/components_to_test/utils
      creating build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_param_resharding_cost.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_shape_consistency_pass.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_liveness_analysis.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_broadcast.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_find_repeat_block.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_ddp.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_checkpoint.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_gemini.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_bias_addition_forward.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      copying tests/test_auto_parallel/test_tensor_shard/test_solver_with_resnet_v2.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard
      creating build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_pass
      copying tests/test_auto_parallel/test_pass/test_size_value_converting_pass.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_pass
      copying tests/test_auto_parallel/test_pass/test_node_converting_pass.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_pass
      copying tests/test_auto_parallel/test_pass/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_pass
      creating build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_gpt
      copying tests/test_auto_parallel/test_tensor_shard/test_gpt/gpt_modules.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_gpt
      copying tests/test_auto_parallel/test_tensor_shard/test_gpt/test_runtime_with_gpt_modules.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_gpt
      copying tests/test_auto_parallel/test_tensor_shard/test_gpt/test_solver_with_gpt_module.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_gpt
      copying tests/test_auto_parallel/test_tensor_shard/test_gpt/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_gpt
      creating build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_addmm_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_where_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_shard_option.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_bias_linear_function_node.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_permute_and_transpose_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_bias_linear_module_node.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_getattr_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_binary_elementwise_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/utils.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_sum_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_layer_norm_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_embedding_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_getitem_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_norm_pooling_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_view_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_default_reshape_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_conv_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_addbmm_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_split_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_tensor_constructor.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_output_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_softmax_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_linear_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/__init__.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_placeholder_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_unary_element_wise_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_bmm_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_matmul_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      copying tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_batch_norm_handler.py -> build/lib.linux-x86_64-cpython-39/tests/test_auto_parallel/test_tensor_shard/test_node_handler
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multi_tensor_apply.cuh -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/layer_norm_cuda_kernel.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/cpu_adam.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/cpu_adam.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/type_shim.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/moe_cuda_kernel.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/scaled_masked_softmax.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/moe_cuda.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/compat.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      copying colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels
      creating build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/strided_batch_gemm.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/normalize_layer.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/kernels.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/ls_cub.cuh -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/context.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      copying colossalai/kernel/cuda_native/csrc/kernels/include/cublas_wrappers.h -> build/lib.linux-x86_64-cpython-39/colossalai/kernel/cuda_native/csrc/kernels/include
      running build_ext
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py:387: UserWarning: The detected CUDA version (11.0) has a minor version mismatch with the version that was used to compile PyTorch (11.6). Most likely this shouldn't be a problem.
        warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py:397: UserWarning: There are no g++ version bounds defined for CUDA version 11.0
        warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
      building 'colossalai._C.cpu_adam' extension
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native
      creating /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc
      Emitting ninja build file /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/1] c++ -MMD -MF /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.o.d -pthread -B /home/liuzixi01/.conda/envs/torch-cuda116/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/liuzixi01/.conda/envs/torch-cuda116/include -fPIC -O2 -isystem /home/liuzixi01/.conda/envs/torch-cuda116/include -fPIC -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/includes -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -lcudart -lcublas -g -Wno-reorder -fopenmp -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cpu_adam -D_GLIBCXX_USE_CXX11_ABI=0
      /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp:244:0: warning: ignoring #pragma unroll  [-Wunknown-pragmas]
       #pragma unroll 4

      /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp:364:0: warning: ignoring #pragma unroll  [-Wunknown-pragmas]
       #pragma unroll 8

      In file included from /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/Exceptions.h:13:0,
                       from /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
                       from /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/extension.h:6,
                       from /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.h:29,
                       from /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp:22:
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<Adam_Optimizer>’:
      /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp:456:51:   required from here
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<Adam_Optimizer>’ declared with greater visibility than the type of its field ‘pybind11::class_<Adam_Optimizer>::<anonymous>’ [-Wattributes]
       class class_ : public detail::generic_type {
             ^~~~~~
      /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<Adam_Optimizer>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
      g++ -pthread -B /home/liuzixi01/.conda/envs/torch-cuda116/compiler_compat -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/liuzixi01/.conda/envs/torch-cuda116/lib -Wl,-rpath-link,/home/liuzixi01/.conda/envs/torch-cuda116/lib -L/home/liuzixi01/.conda/envs/torch-cuda116/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/liuzixi01/.conda/envs/torch-cuda116/lib -Wl,-rpath-link,/home/liuzixi01/.conda/envs/torch-cuda116/lib -L/home/liuzixi01/.conda/envs/torch-cuda116/lib /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/cpu_adam.o -L/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-39/colossalai/_C/cpu_adam.cpython-39-x86_64-linux-gnu.so
      building 'colossalai._C.fused_optim' extension
      Emitting ninja build file /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/6] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [2/6] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [3/6] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [4/6] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [5/6] /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      FAILED: /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.o
      /usr/local/cuda/bin/nvcc  -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -lineinfo -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      nvcc fatal   : Unsupported gpu architecture 'compute_86'
      [6/6] c++ -MMD -MF /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/colossal_C_frontend.o.d -pthread -B /home/liuzixi01/.conda/envs/torch-cuda116/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/liuzixi01/.conda/envs/torch-cuda116/include -fPIC -O2 -isystem /home/liuzixi01/.conda/envs/torch-cuda116/include -fPIC -I/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/kernels/include -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/TH -I/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/liuzixi01/.conda/envs/torch-cuda116/include/python3.9 -c -c /home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp -o /home/liuzixi01/ColossalAI/build/temp.linux-x86_64-cpython-39/home/liuzixi01/ColossalAI/colossalai/kernel/cuda_native/csrc/colossal_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_optim -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
          subprocess.run(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/subprocess.py", line 528, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/liuzixi01/ColossalAI/setup.py", line 170, in <module>
          setup(name=package_name,
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/command/install.py", line 68, in run
          return orig.install.run(self)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/install.py", line 698, in run
          self.run_command('build')
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
          self.build_extensions()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
          build_ext.build_extensions(self)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> colossalai

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
MikeDean2367 commented 1 year ago

请问你是怎么解决的呢

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


How did you solve it?

GeraldWu23 commented 1 year ago

running into a similar issue in apex installation, how is it solved?

TheLolita commented 1 year ago

希望能得到您的解决方案

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Hope to get your solution

catqaq commented 1 year ago

any idea?