2024-06-12T22:49:04.3043574Z building 'deepspeed.ops.comm.deepspeed_ccl_comm_op' extension
2024-06-12T22:49:04.3043994Z creating build/temp.linux-x86_64-cpython-39
2024-06-12T22:49:04.3053293Z creating build/temp.linux-x86_64-cpython-39/csrc
2024-06-12T22:49:04.3054113Z creating build/temp.linux-x86_64-cpython-39/csrc/cpu
2024-06-12T22:49:04.3054729Z creating build/temp.linux-x86_64-cpython-39/csrc/cpu/comm
2024-06-12T22:49:04.3071806Z /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_build_env/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fPIC -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work=/usr/local/src/conda/deepspeed-0.14.3 -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p=/usr/local/src/conda-prefix -isystem /usr/local/cuda/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -isystem /usr/local/cuda/include -fPIC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/csrc/cpu/includes -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/TH -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/THC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/python3.9 -c csrc/cpu/comm/ccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/cpu/comm/ccl.o -O2 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=deepspeed_ccl_comm_op -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2024-06-12T22:49:08.2062484Z csrc/cpu/comm/ccl.cpp:8:10: fatal error: oneapi/ccl.hpp: No such file or directory
2024-06-12T22:49:08.2067800Z 8 | #include <oneapi/ccl.hpp>
2024-06-12T22:49:08.2068222Z | ^~~~~~~~~~~~~~~~
2024-06-12T22:49:08.2068507Z compilation terminated.
2024-06-12T22:49:08.2182741Z error: command '/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_build_env/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
2024-06-12T22:49:08.6174937Z error: subprocess-exited-with-error
2024-06-12T22:49:08.6176666Z
2024-06-12T22:49:08.6188012Z × python setup.py bdist_wheel did not run successfully.
2024-06-12T22:49:08.6227487Z │ exit code: 1
2024-06-12T22:49:08.6240717Z ╰─> See above for output.
2024-06-12T22:49:08.6252920Z
2024-06-12T22:49:08.6264017Z note: This error originates from a subprocess, and is likely not a problem with pip.
2024-06-12T22:49:08.6271330Z full command: /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/bin/python -u -c '
2024-06-12T22:49:08.6272043Z exec(compile('"'"''"'"''"'"'
2024-06-12T22:49:08.6277838Z # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
2024-06-12T22:49:08.6283726Z #
2024-06-12T22:49:08.6284428Z # - It imports setuptools before invoking setup.py, to enable projects that directly
2024-06-12T22:49:08.6289287Z # import from `distutils.core` to work with newer packaging standards.
2024-06-12T22:49:08.6289949Z # - It provides a clear error message when setuptools is not installed.
2024-06-12T22:49:08.6295383Z # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
2024-06-12T22:49:08.6295837Z # setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
2024-06-12T22:49:08.6301077Z # manifest_maker: standard file '"'"'-c'"'"' not found".
2024-06-12T22:49:08.6307069Z # - It generates a shim setup.py, for handling setup.cfg-only projects.
2024-06-12T22:49:08.6307810Z import os, sys, tokenize
2024-06-12T22:49:08.6314125Z
2024-06-12T22:49:08.6314907Z try:
2024-06-12T22:49:08.6320956Z import setuptools
2024-06-12T22:49:08.6321316Z except ImportError as error:
2024-06-12T22:49:08.6325049Z print(
2024-06-12T22:49:08.6326023Z "ERROR: Can not execute `setup.py` since setuptools is not available in "
2024-06-12T22:49:08.6335095Z "the build environment.",
2024-06-12T22:49:08.6335348Z file=sys.stderr,
2024-06-12T22:49:08.6338543Z )
2024-06-12T22:49:08.6338832Z sys.exit(1)
2024-06-12T22:49:08.6339045Z
2024-06-12T22:49:08.6339554Z __file__ = %r
2024-06-12T22:49:08.6340070Z sys.argv[0] = __file__
2024-06-12T22:49:08.6340336Z
2024-06-12T22:49:08.6340562Z if os.path.exists(__file__):
2024-06-12T22:49:08.6340835Z filename = __file__
2024-06-12T22:49:08.6341059Z with tokenize.open(__file__) as f:
2024-06-12T22:49:08.6341411Z setup_py_code = f.read()
2024-06-12T22:49:08.6341621Z else:
2024-06-12T22:49:08.6341993Z filename = "<auto-generated setuptools caller>"
2024-06-12T22:49:08.6342280Z setup_py_code = "from setuptools import setup; setup()"
2024-06-12T22:49:08.6342576Z
2024-06-12T22:49:08.6342895Z exec(compile(setup_py_code, filename, "exec"))
2024-06-12T22:49:08.6343569Z '"'"''"'"''"'"' % ('"'"'/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-v4pibtb1
2024-06-12T22:49:08.6344021Z cwd: /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/
2024-06-12T22:49:08.6344416Z Building wheel for deepspeed (setup.py): finished with status 'error'
2024-06-12T22:49:08.6348857Z ERROR: Failed building wheel for deepspeed
2024-06-12T22:49:08.6349126Z Running setup.py clean for deepspeed
2024-06-12T22:49:08.6349635Z Running command python setup.py clean
2024-06-12T22:49:10.9424015Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2024-06-12T22:49:10.9498275Z [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
2024-06-12T22:49:10.9509931Z DS_BUILD_OPS=1
2024-06-12T22:49:17.9894660Z Install Ops={'deepspeed_not_implemented': 1, 'deepspeed_ccl_comm': 1, 'deepspeed_shm_comm': 1, 'cpu_adam': 1, 'fused_adam': 1}
2024-06-12T22:49:18.0269777Z version=0.14.3, git_hash=f492cfc, git_branch=HEAD
2024-06-12T22:49:18.0270870Z install_requires=['hjson', 'ninja', 'numpy', 'nvidia-ml-py', 'packaging>=20.0', 'psutil', 'py-cpuinfo', 'pydantic', 'torch', 'tqdm']
2024-06-12T22:49:18.0278897Z ext_modules=[<setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_not_implemented_op') at 0x7ff248dbe460>, <setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_ccl_comm_op') at 0x7ff248dbe4c0>, <setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_shm_comm_op') at 0x7ff248dbe520>, <setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7ff16dd51b80>, <setuptools.extension.Extension('deepspeed.ops.adam.fused_adam_op') at 0x7ff16dd51d90>]
2024-06-12T22:49:18.0651351Z running clean
2024-06-12T22:49:18.0714575Z removing 'build/temp.linux-x86_64-cpython-39' (and everything under it)
2024-06-12T22:49:18.0715596Z removing 'build/lib.linux-x86_64-cpython-39' (and everything under it)
2024-06-12T22:49:18.1133919Z 'build/bdist.linux-x86_64' does not exist -- can't clean it
2024-06-12T22:49:18.1143315Z 'build/scripts-3.9' does not exist -- can't clean it
2024-06-12T22:49:18.1151899Z removing 'build'
2024-06-12T22:49:18.1171105Z deepspeed build time = 0.08735942840576172 secs
2024-06-12T22:49:18.5956605Z Failed to build deepspeed
2024-06-12T22:49:18.5973299Z ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
2024-06-12T22:49:18.5973660Z Exception information:
2024-06-12T22:49:18.5986009Z Traceback (most recent call last):
2024-06-12T22:49:18.5993394Z File "$PREFIX/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
2024-06-12T22:49:18.5993851Z status = run_func(*args)
2024-06-12T22:49:18.5994316Z File "$PREFIX/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
2024-06-12T22:49:18.5995399Z return func(self, options, args)
2024-06-12T22:49:18.5999462Z File "$PREFIX/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 429, in run
2024-06-12T22:49:18.5999708Z raise InstallationError(
2024-06-12T22:49:18.6006202Z pip._internal.exceptions.InstallationError: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
2024-06-12T22:49:18.6010801Z Removed build tracker: '/tmp/pip-build-tracker-gvbib0oo'
2024-06-12T22:49:20.4656669Z Traceback (most recent call last):
2024-06-12T22:49:20.4664375Z File "/opt/conda/bin/conda-build", line 11, in <module>
2024-06-12T22:49:20.4669893Z sys.exit(execute())
2024-06-12T22:49:20.4670437Z File "/opt/conda/lib/python3.10/site-packages/conda_build/cli/main_build.py", line 590, in execute
2024-06-12T22:49:20.4677725Z api.build(
2024-06-12T22:49:20.4678886Z File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 250, in build
2024-06-12T22:49:20.4685860Z return build_tree(
2024-06-12T22:49:20.4691479Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3638, in build_tree
2024-06-12T22:49:20.4708481Z packages_from_this = build(
2024-06-12T22:49:20.4713969Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2506, in build
2024-06-12T22:49:20.4714313Z utils.check_call_env(
2024-06-12T22:49:20.4724506Z File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 405, in check_call_env
2024-06-12T22:49:20.4729616Z return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
2024-06-12T22:49:20.4730205Z File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 381, in _func_defaulting_env_to_os_environ
2024-06-12T22:49:20.4735612Z raise subprocess.CalledProcessError(proc.returncode, _args)
2024-06-12T22:49:20.4736400Z subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/conda_build.sh']' returned non-zero exit status 1.
2024-06-12T22:49:30.4784588Z
2024-06-12T22:49:30.5793301Z ##[error]Bash exited with code '1'.
2024-06-12T22:49:30.5974127Z ##[section]Finishing: Run docker build
System info (please complete the following information):
OS: Ubuntu 22.04.4
GPU count and types [e.g. two machines with x8 A100s each]: 1 NVIDIA GPU
Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]: N/A
Python version: 3.9
Any other relevant info about your setup: None
Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else? No
Docker context
Are you using a specific docker image that you can share?
quay.io/condaforge/linux-anvil-cuda:11.8
Additional context
Add any other context about the problem here.
The builds have been failing in these PRs as well:
Describe the bug
The builds on conda-forge have been failing since
deepspeed=0.14.1
for CUDA 11.8 and 12.0 with an error likefatal error: oneapi/ccl.hpp: No such file or directory
. Originally reported at https://github.com/conda-forge/deepspeed-feedstock/pull/56#issuecomment-2062611899.To Reproduce Steps to reproduce the behavior:
python build_locally.py
locally, select the option with CUDA 11.8 and Python 3.9Expected behavior A clear and concise description of what you expected to happen.
CUDA builds work as expected.
ds_report output Please run
ds_report
to give us details about your setup.Note, this isn't the exact report for the conda-forge CI device, I copied this from the CPU build logs
Screenshots
Truncated traceback from https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=953875&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=350df31b-3291-5209-0bb7-031395f0baa1&l=3486:
System info (please complete the following information):
Launcher context Are you launching your experiment with the
deepspeed
launcher, MPI, or something else? NoDocker context Are you using a specific docker image that you can share?
quay.io/condaforge/linux-anvil-cuda:11.8
Additional context Add any other context about the problem here.
The builds have been failing in these PRs as well: