microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.49k stars 4.12k forks source link

[BUG] ImportError with git build #2697

Closed brucethemoose closed 1 year ago

brucethemoose commented 1 year ago

The latest git version of deepspeeds (aef8a85) builds and imports just fine, but trying to use it in pretty much anything results in an import error. For instance:


/tmp/DeepSpeed master 10s
❯ accelerate config
Traceback (most recent call last):
  File "/home/alpha/.local/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 27, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 122, in <module>
    from .other import (
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/utils/other.py", line 27, in <module>
    from deepspeed import DeepSpeedEngine
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 14, in <module>
    from . import ops
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/__init__.py", line 1, in <module>
    from . import adam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/__init__.py", line 2, in <module>
    from .fused_adam import FusedAdam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 13, in <module>
    from deepspeed.ops.op_builder.builder_names import FusedAdamBuilder
ImportError: cannot import name 'FusedAdamBuilder' from 'deepspeed.ops.op_builder.builder_names' (/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder_names.py)

/tmp/DeepSpeed master

This seems to go back a few commits.

And why build from source? Well I am trying to figure out why accelerate's deepspeed stage 2 config is not working on the official (0.7.7) release in this repo: https://github.com/kohya-ss/sd-scripts/issues/63

And was hoping the latest commit may fix something.

jeffra commented 1 year ago

What accelerate version are you using here? I just tried to repro this with accelerate==0.15.0 + aef8a85 and it doesn't seem to trigger an error.

ds_report output:

DeepSpeed general environment info:
torch install path ............... ['/home/jerasley/base/lib/python3.8/site-packages/torch']
torch version .................... 1.13.1+cu116
deepspeed install path ........... ['/home/jerasley/base/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.8.0, aef8a856, master
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6

pip show accelerate transformers

Name: accelerate
Version: 0.15.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: sylvain@huggingface.co
License: Apache
Location: /home/jerasley/base/lib/python3.8/site-packages
Requires: numpy, packaging, psutil, pyyaml, torch
Required-by:
---
Name: transformers
Version: 4.25.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache
Location: /home/jerasley/base/lib/python3.8/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, tokenizers, tqdm
Required-by: deepspeed-mii, mii

Here's my output of accelerate config:

image
brucethemoose commented 1 year ago

@jeffra

tmp/DeepSpeed master
❯ ds_report
Traceback (most recent call last):
  File "/home/alpha/.local/bin/ds_report", line 3, in <module>
    from deepspeed.env_report import cli_main
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 14, in <module>
    from . import ops
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/__init__.py", line 1, in <module>
    from . import adam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/__init__.py", line 2, in <module>
    from .fused_adam import FusedAdam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 13, in <module>
    from deepspeed.ops.op_builder.builder_names import FusedAdamBuilder
ImportError: cannot import name 'FusedAdamBuilder' from 'deepspeed.ops.op_builder.builder_names' (/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder_names.py)

/tmp/DeepSpeed master
❯ pip show accelerate transformers
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
Name: accelerate
Version: 0.15.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: sylvain@huggingface.co
License: Apache
Location: /home/alpha/.local/lib/python3.10/site-packages
Requires: numpy, packaging, psutil, pyyaml, torch
Required-by: k-diffusion
---
Name: transformers
Version: 4.25.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache
Location: /home/alpha/.local/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, tokenizers, tqdm
Required-by:

/tmp/DeepSpeed master
❯ accelerate config
2023-01-12 19:16:24.568398: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/alpha/.local/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 27, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 122, in <module>
    from .other import (
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/utils/other.py", line 27, in <module>
    from deepspeed import DeepSpeedEngine
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 14, in <module>
    from . import ops
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/__init__.py", line 1, in <module>
    from . import adam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/__init__.py", line 2, in <module>
    from .fused_adam import FusedAdam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 13, in <module>
    from deepspeed.ops.op_builder.builder_names import FusedAdamBuilder
ImportError: cannot import name 'FusedAdamBuilder' from 'deepspeed.ops.op_builder.builder_names' (/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder_names.py)

Running Python 3.10.9, here are my packages:

pip freeze absl-py==1.3.0 accelerate==0.15.0 addict==2.4.0 aenum==3.1.11 aiohttp==3.8.3 aiosignal==1.3.1 alabaster==0.7.12 albumentations==1.3.0 altair==4.2.0 antlr4-python3-runtime==4.9.3 anyio==3.6.2 appdirs==1.4.4 astor==0.8.1 astroid==2.13.2 astunparse==1.6.3 async-timeout==4.0.2 attrs==22.2.0 autocommand==2.2.2 Babel==2.11.0 basicsr==1.4.2 bcrypt==4.0.1 beautifulsoup4==4.11.1 bidict==0.22.0 bitsandbytes==0.36.0.post2 black==22.12.0 bleach==5.0.1 blendmodes==2022 blinker==1.5 boltons==21.0.0 Brotli==1.0.9 btrfsutil==6.1.2 build==0.10.0 cachetools==5.2.0 certifi==2022.12.7 cffi==1.15.1 cfgv==3.3.1 chardet==4.0.0 charset-normalizer==2.1.1 clean-fid==0.1.35 click==8.1.3 clip @ git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1 clipseg @ https://github.com/invoke-ai/clipseg/archive/relaxed-python-requirement.zip cmake==3.25.0 colorama==0.4.6 coloredlogs==15.0.1 colossalai==0.2.0 commonmark==0.9.1 contexttimer==0.3.3 contourpy==1.0.6 cryptography==38.0.4 cuda-python==11.8.0 cycler==0.11.0 Cython==0.29.33 debugpy==1.6.5 decorator==4.4.2 deepspeed @ file:///tmp/DeepSpeed defusedxml==0.7.1 deprecation==2.1.0 diffusers==0.10.2 dill==0.3.6 discord-webhook==1.0.0 distlib==0.3.6 dnspython==2.2.1 docker-pycreds==0.4.0 docutils==0.19 einops==0.4.1 entrypoints==0.4 eventlet==0.33.2 exceptiongroup==1.1.0 fabric==2.7.1 facexlib==0.2.5 fairscale==0.4.4 fastapi==0.87.0 ffmpy==0.3.0 filelock==3.9.0 filterpy==1.4.5 Flask==2.1.3 Flask-Cors==3.0.10 Flask-SocketIO==5.3.0 flaskwebgui==1.0.3 flatbuffers==22.11.23 font-roboto==0.0.1 fonts==0.0.3 fonttools==4.38.0 frozenlist==1.3.3 fsspec==2022.11.0 ftfy==6.1.1 future==0.18.2 gast==0.4.0 gdown==4.5.4 getpass-asterisk==1.0.1 gfpgan==1.3.8 gitdb==4.0.10 GitPython==3.1.27 Glances==3.3.0 gomp==1.1.0 google-auth==2.15.0 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 gradio==3.15.0 greenlet==2.0.1 grpcio==1.51.1 h11==0.12.0 h5py==3.7.0 hjson==3.1.0 html5lib==1.1 httpcore==0.15.0 httpx==0.23.1 huggingface-hub==0.11.1 humanfriendly==10.0 identify==2.5.12 idna==3.4 imageio==2.23.0 imageio-ffmpeg==0.4.7 imagesize==1.4.1 importlib-metadata==5.1.0 inflect==6.0.2 inflection==0.5.1 iniconfig==1.1.1 installer==0.6.0 invisible-watermark==0.1.5 invoke==1.7.3 isort==5.11.4 itsdangerous==2.1.2 jaraco.context==4.2.0 jaraco.functools==3.5.2 jaraco.text==3.11.0 jedi==0.18.2 Jinja2==3.1.2 joblib==1.2.0 jsonmerge==1.8.0 jsonschema==4.17.3 k-diffusion @ https://github.com/Birch-san/k-diffusion/archive/refs/heads/mps.zip keras==2.11.0 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 kiwisolver==1.4.4 kornia==0.6.7 lark==1.1.2 lazy-object-proxy==1.9.0 lensfun==0.3.3 libclang==15.0.6.1 lightning-utilities==0.5.0 linkify-it-py==1.0.3 lit==15.0.7.dev0 llvmlite==0.39.1 lmdb==1.3.0 lpips==0.1.4 Mako==1.2.4 Markdown==3.4.1 markdown-it-py==2.1.0 MarkupSafe==2.1.1 matplotlib==3.6.2 mccabe==0.7.0 mdit-py-plugins==0.3.1 mdurl==0.1.2 modelcards==0.1.6 more-itertools==9.0.0 moviepy==1.0.3 mpmath==1.2.1 multidict==6.0.4 mutagen==1.46.0 mypy-extensions==0.4.3 networkx==3.0rc1 nftables==0.1 ninja==1.11.1 nodeenv==1.7.0 numba==0.56.4 numpy==1.23.5 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 oauthlib==3.2.2 omegaconf==2.2.3 onnx==1.13.0 onnxruntime==1.13.1 open-clip-torch==2.9.1 opencv-contrib-python==4.7.0.68 opencv-python==4.7.0.68 opencv-python-headless==4.7.0.68 opt-einsum==3.3.0 ordered-set==4.1.0 orjson==3.8.3 packaging==22.0 pandas==1.5.2 paramiko==2.12.0 parso==0.8.3 path==16.6.0 pathlib2==2.3.7.post1 pathspec==0.10.3 pathtools==0.1.2 pedalboard==0.6.7 pep517==0.13.0 picklescan==0.0.7 piexif==1.1.3 Pillow==9.3.0 pip-api==0.0.30 pip-run==9.2.1 pip-shims==0.7.3 platformdirs==2.6.2 pluggy==1.0.0 ply==3.11 pooch==1.6.0 pproxy==2.7.8 pre-commit==2.21.0 proglog==0.1.10 promise==2.3 protobuf==3.20.3 psutil==5.9.4 pudb==2022.1.3 py-cpuinfo==9.0.0 pyarrow==10.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.10.3 pycairo==1.23.0 pyclibrary==0.2.1 pycparser==2.21 pycryptodome==3.16.0 pycryptodomex==3.12.0 pycuda==2022.1 pydantic==1.10.4 pydeck==0.8.0 pyDeprecate==0.3.2 pydot==1.4.2 pydub==0.25.1 pyelftools==0.29 Pygments==2.14.0 PyGObject==3.42.2 pylint==2.15.10 Pympler==1.0.1 PyNaCl==1.5.0 pyparsing==3.0.9 pypatchmatch @ https://github.com/invoke-ai/PyPatchMatch/archive/refs/tags/0.1.5.zip pyperf==2.4.1 pyperformance==1.0.4 pyproject_hooks==1.0.0 PyQt5==5.15.7 PyQt5-sip==12.11.0 pyre-extensions==0.0.23 pyreadline3==3.4.1 pyrsistent==0.19.3 PySimpleGUI==4.60.4 PySocks==1.7.1 pytest==7.2.0 python-dateutil==2.8.2 python-engineio==4.3.4 python-multipart==0.0.5 python-socketio==5.7.2 pytools==2022.1.14 pytorch-lightning==1.7.7 pytorch-triton==2.0.0+0d7e753227 pytz==2022.6 pytz-deprecation-shim==0.1.0.post0 PyWavelets==1.4.1 PyYAML==6.0 QTermWidget==1.2.0 qudida==0.0.4 rangehttpserver==1.2.0 realesrgan==0.3.0 Reflector==2021.11.20.2.41.3 regex==2022.10.31 requests==2.28.1 requests-oauthlib==1.3.1 resize-right==0.0.2 rfc3986==1.5.0 rich==13.0.0 rsa==4.9 safetensors==0.2.7 scikit-image==0.19.2 scikit-learn==1.2.0 scipy==1.10.0 semver==2.13.0 Send2Trash==1.8.1b0 sentencepiece==0.1.97 sentry-sdk==1.12.1 setproctitle==1.3.2 shortuuid==1.0.11 six==1.16.0 smmap==5.0.0 sniffio==1.3.0 snowballstemmer==2.2.0 soupsieve==2.3.2.post1 Sphinx==5.3.0 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 sphinxcontrib.applehelp==1.0.3 starlette==0.21.0 streamlit==1.16.0 sympy==1.11.1 taming-transformers-rom1504==0.0.6 tb-nightly==2.12.0a20230112 TBB==0.2 team==1.0 tensorboard==2.11.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow-estimator==2.11.0 tensorflow-gpu==2.11.0 tensorflow-io-gcs-filesystem==0.29.0 termcolor==2.2.0 test-tube==0.7.5 threadpoolctl==3.1.0 tifffile==2022.10.10 timm==0.4.12 tokenizers==0.12.1 toml==0.10.2 tomli==2.0.1 tomlkit==0.11.6 toolz==0.12.0 torch==2.0.0.dev20230112+cu118 torch-fidelity==0.3.0 torchaudio==2.0.0.dev20230112+cu118 torchdiffeq==0.2.3 torchmetrics==0.11.0 torchsde==0.2.5 torchvision==0.15.0.dev20230112+cu118 tornado==6.2 tqdm==4.64.1 trampoline==0.1.2 transformers==4.25.1 triton==2.0.0.dev20221202 trove-classifiers==2022.12.22 typing-inspect==0.8.0 typing_extensions==4.4.0 tzdata==2022.7 tzlocal==4.2 uc-micro-py==1.0.1 urllib3==1.26.13 urwid==2.1.2 urwid-readline==0.13 uvicorn==0.20.0 validate-pyproject==0.10.1 validators==0.18.2 virtualenv==20.17.1 wandb==0.13.7 watchdog==2.2.0 wcwidth==0.2.5 webencodings==0.5.1 websockets==10.4 Werkzeug==2.2.2 wrapt==1.14.1 xformers @ file:///home/alpha/xformers-0.0.16%2B2166360.d20230112-cp310-cp310-linux_x86_64.whl yapf==0.32.0 yarl==1.8.2 yt-dlp==2023.1.6 zipp==3.11.0

What I am testing on locally:

           .-------------------------:                    alpha@Asus-GA401IV
          .+=========================.                    ------------------
         :++===++==================-       :++-           OS: CachyOS Linux x86_64
        :*++====+++++=============-        .==:           Host: ROG Zephyrus G14 GA401IV_GA401IV (1.0)
       -*+++=====+***++==========:                        Kernel: 6.1.4-1-cachyos-lto
      =*++++========------------:                         Uptime: 5 hours, 21 mins
     =*+++++=====-                     ...                Packages: 1250 (pacman)
   .+*+++++=-===:                    .=+++=:              Shell: fish 3.5.1
  :++++=====-==:                     -*****+              Resolution: 3840x2160 @ 60Hz
 :++========-=.                      .=+**+.              DE: KDE Plasma 5.26.5
.+==========-.                          .                 WM: KWin (Wayland)
 :+++++++====-                                .--==-.     WM Theme: Breeze
  :++==========.                             :+++++++:    Theme: Lightly (CachyOSNord) [QT], cachyos-nor]
   .-===========.                            =*****+*+    Icons: breeze-dark [QT], breeze-dark [GTK2/3/4]
    .-===========:                           .+*****+:    Font: Noto Sans (10pt) [QT], Noto Sans (10pt) ]
      -=======++++:::::::::::::::::::::::::-:  .---:      Cursor: capitaine (24px)
       :======++++====+++******************=.             Terminal: alacritty
        :=====+++==========++++++++++++++*-               Terminal Font: monospace (12pt)
         .====++==============++++++++++*-                CPU: AMD Ryzen 9 4900HS (16) @ 3 GHz
          .===+==================+++++++:                 GPU: AMD Renoir
           .-=======================+++:                  GPU: NVIDIA GeForce RTX 2060 Max-Q
             ..........................                   Memory: 8.22 GiB / 15.05 GiB (54%)
                                                          Disk (/): 110 GiB / 139 GiB (78%)
                                                          Disk (/home/alpha/Storage): 310 GiB / 344 GiB )
                                                          Disk (/run/media/alpha/External): 140 GiB / 93]
                                                          Disk (/windows): 296 GiB / 434 GiB (68%) [Remo]
                                                          Battery: 100% [Not charging]
                                                          Locale: en_US.UTF-8

(And that tensorflow warning is just a quirk of the native Arch Linux package)

brucethemoose commented 1 year ago

And the output of a fresh installation attempt:

tmp
❯ git clone https://github.com/microsoft/DeepSpeed
Cloning into 'DeepSpeed'...
remote: Enumerating objects: 28589, done.
remote: Counting objects: 100% (421/421), done.
remote: Compressing objects: 100% (256/256), done.
remote: Total 28589 (delta 248), reused 295 (delta 165), pack-reused 28168
Receiving objects: 100% (28589/28589), 33.57 MiB | 11.31 MiB/s, done.
Resolving deltas: 100% (20499/20499), done.

/tmp
❯ cd DeepSpeed/

/tmp/DeepSpeed master
❯ pip install .
Defaulting to user installation because normal site-packages is not writeable
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
Processing /tmp/DeepSpeed
  Preparing metadata (setup.py) ... done
Requirement already satisfied: hjson in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (3.1.0)
Requirement already satisfied: ninja in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (1.11.1)
Requirement already satisfied: numpy in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (1.23.5)
Requirement already satisfied: packaging in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (22.0)
Requirement already satisfied: psutil in /usr/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (5.9.4)
Requirement already satisfied: py-cpuinfo in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (9.0.0)
Requirement already satisfied: pydantic in /usr/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (1.10.4)
Requirement already satisfied: torch in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (2.0.0.dev20230112+cu118)
Requirement already satisfied: tqdm in /home/alpha/.local/lib/python3.10/site-packages (from deepspeed==0.8.0+aef8a856) (4.64.1)
Requirement already satisfied: typing-extensions>=4.2.0 in /home/alpha/.local/lib/python3.10/site-packages (from pydantic->deepspeed==0.8.0+aef8a856) (4.4.0)
Requirement already satisfied: networkx in /home/alpha/.local/lib/python3.10/site-packages (from torch->deepspeed==0.8.0+aef8a856) (3.0rc1)
Requirement already satisfied: pytorch-triton==2.0.0+0d7e753227 in /home/alpha/.local/lib/python3.10/site-packages (from torch->deepspeed==0.8.0+aef8a856) (2.0.0+0d7e753227)
Requirement already satisfied: sympy in /home/alpha/.local/lib/python3.10/site-packages (from torch->deepspeed==0.8.0+aef8a856) (1.11.1)
Requirement already satisfied: filelock in /home/alpha/.local/lib/python3.10/site-packages (from pytorch-triton==2.0.0+0d7e753227->torch->deepspeed==0.8.0+aef8a856) (3.9.0)
Requirement already satisfied: cmake in /home/alpha/.local/lib/python3.10/site-packages (from pytorch-triton==2.0.0+0d7e753227->torch->deepspeed==0.8.0+aef8a856) (3.25.0)
Requirement already satisfied: mpmath>=0.19 in /home/alpha/.local/lib/python3.10/site-packages (from sympy->torch->deepspeed==0.8.0+aef8a856) (1.2.1)
Building wheels for collected packages: deepspeed
  Building wheel for deepspeed (setup.py) ... done
  Created wheel for deepspeed: filename=deepspeed-0.8.0+aef8a856-py3-none-any.whl size=760411 sha256=716e9a79dd19196bd60eb5395673584ce1e8d363ad330e9d69a06f6ef65e6f89
  Stored in directory: /tmp/pip-ephem-wheel-cache-oakoedrq/wheels/a2/ea/d8/a0a5ae4cd2516d6554e52b680c03214c5c4359a78b8309f8f1
Successfully built deepspeed
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
Installing collected packages: deepspeed
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
Successfully installed deepspeed-0.8.0+aef8a856
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)
WARNING: Ignoring invalid distribution -orch (/home/alpha/.local/lib/python3.10/site-packages)

/tmp/DeepSpeed master 7s
❯ ds_report
Traceback (most recent call last):
  File "/home/alpha/.local/bin/ds_report", line 3, in <module>
    from deepspeed.env_report import cli_main
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 14, in <module>
    from . import ops
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/__init__.py", line 1, in <module>
    from . import adam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/__init__.py", line 2, in <module>
    from .fused_adam import FusedAdam
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 13, in <module>
    from deepspeed.ops.op_builder.builder_names import FusedAdamBuilder
ImportError: cannot import name 'FusedAdamBuilder' from 'deepspeed.ops.op_builder.builder_names' (/home/alpha/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder_names.py)
brucethemoose commented 1 year ago

Also this does appear to be some kind of regression, as the release build of deepspeed initializes without any errors.

I am trying to test this on Windows on my same machine, but am having some trouble with the dependencies (ninja, I would assume?):

C:\Users\Alpha\scratch\DeepSpeed>pip install .
Processing c:\users\alpha\scratch\deepspeed
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [13 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Alpha\scratch\DeepSpeed\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\Alpha\scratch\DeepSpeed\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      DS_BUILD_OPS=1
      ←[93m [WARNING] ←[0m async_io requires the dev libaio .so object and headers but these were not found.
      ←[93m [WARNING] ←[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
      ←[93m [WARNING] ←[0m One can disable async_io with DS_BUILD_AIO=0
      ←[31m [ERROR] ←[0m Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Data-drone commented 1 year ago

Hmmm I just hit this as well with a pip install deepspeed

but

pip install deepspeed==0.7.7 worked

HeyangQin commented 1 year ago

Hello @brucethemoose and @Data-drone. Thank you for reporting the error. I tried both aef8a85 and the current pypi version of deepspeed yet I cannot reproduce the error. Could you try to cd to a random dir other than the DeepSpeed dir and see if the error is still there?

@delock Do you have any hint of what might go wrong here?

@brucethemoose For the error you see when installing on Windows, that is expected as we don't support Windows for now.

HeyangQin commented 1 year ago

To follow up, this should have been fixed by https://github.com/microsoft/DeepSpeed/pull/2677/commits/b587c7e85470329ac25df7c7c2521ff9b2833db7. If you still have such issues with the latest version of deepspeed, please feel free to reopen this issue