microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.47k stars 4.12k forks source link

DeepSpeed just doesn't install properly on Databricks #5479

Closed vikram71198 closed 6 months ago

vikram71198 commented 6 months ago

I've been trying for a while to setup DeepSpeed on my Databricks cluster correctly, but have been largely unsuccessful in doing so.

Platform Specifications absl-py==1.0.0 accelerate==0.29.3 aiohttp==3.9.1 aiosignal==1.3.1 anyio==3.5.0 appdirs==1.4.4 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 astor==0.8.1 asttokens==2.0.5 astunparse==1.6.3 async-timeout==4.0.3 attrs==22.1.0 audioread==3.0.1 azure-core==1.29.1 azure-cosmos==4.3.1 azure-storage-blob==12.19.0 azure-storage-file-datalake==12.14.0 backcall==0.2.0 bcrypt==3.2.0 beautifulsoup4==4.11.1 black==22.6.0 bleach==4.1.0 blinker==1.4 blis==0.7.11 boto3==1.24.28 botocore==1.27.96 cachetools==5.3.2 catalogue==2.0.10 category-encoders==2.6.3 certifi==2022.12.7 cffi==1.15.1 chardet==4.0.0 charset-normalizer==2.0.4 click==8.0.4 cloudpathlib==0.16.0 cloudpickle==2.0.0 cmake==3.28.1 cmdstanpy==1.2.0 comm==0.1.2 confection==0.1.4 configparser==5.2.0 contourpy==1.0.5 cryptography==39.0.1 cycler==0.11.0 cymem==2.0.8 Cython==0.29.32 dacite==1.8.1 databricks-automl-runtime==0.2.20 databricks-cli==0.18.0 databricks-feature-engineering==0.2.1 databricks-sdk==0.1.6 dataclasses-json==0.6.3 datasets==2.15.0 dbl-tempo==0.1.26 dbus-python==1.2.18 debugpy==1.6.7 decorator==5.1.1 deepspeed==0.14.2 defusedxml==0.7.1 dill==0.3.6 diskcache==5.6.3 distlib==0.3.7 distro==1.7.0 distro-info==1.1+ubuntu0.2 docstring-to-markdown==0.11 docstring_parser==0.16 einops==0.7.0 entrypoints==0.4 evaluate==0.4.1 executing==0.8.3 facets-overview==1.1.1 fastjsonschema==2.19.1 fasttext==0.9.2 filelock==3.9.0 flash-attn==2.5.7 Flask==2.2.5 flatbuffers==23.5.26 fonttools==4.25.0 frozenlist==1.4.1 fsspec==2023.6.0 future==0.18.3 gast==0.4.0 gensim==4.3.2 gitdb==4.0.11 GitPython==3.1.27 google-api-core==2.15.0 google-auth==2.21.0 google-auth-oauthlib==1.0.0 google-cloud-core==2.4.1 google-cloud-storage==2.11.0 google-crc32c==1.5.0 google-pasta==0.2.0 google-resumable-media==2.7.0 googleapis-common-protos==1.62.0 greenlet==2.0.1 grpcio==1.48.2 grpcio-status==1.48.1 gunicorn==20.1.0 gviz-api==1.10.0 h5py==3.7.0 hf_transfer==0.1.6 hjson==3.1.0 holidays==0.38 horovod==0.28.1 htmlmin==0.1.12 httplib2==0.20.2 huggingface-hub==0.21.3 idna==3.4 ImageHash==4.3.1 imbalanced-learn==0.11.0 importlib-metadata==4.11.3 importlib-resources==6.1.1 ipykernel==6.25.0 ipython==8.14.0 ipython-genutils==0.2.0 ipywidgets==7.7.2 isodate==0.6.1 itsdangerous==2.0.1 jedi==0.18.1 jeepney==0.7.1 Jinja2==3.1.2 jmespath==0.10.0 joblib==1.2.0 joblibspark==0.5.1 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.17.3 jupyter-client==7.3.4 jupyter-server==1.23.4 jupyter_core==5.2.0 jupyterlab-pygments==0.1.2 jupyterlab-widgets==1.0.0 keras==2.14.0 keyring==23.5.0 kiwisolver==1.4.4 langchain==0.0.348 langchain-core==0.0.13 langcodes==3.3.0 langsmith==0.0.79 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lazy_loader==0.3 libclang==15.0.6.1 librosa==0.10.1 lightgbm==4.1.0 lit==17.0.6 llvmlite==0.39.1 lxml==4.9.1 Mako==1.2.0 Markdown==3.4.1 markdown-it-py==3.0.0 MarkupSafe==2.1.1 marshmallow==3.20.2 matplotlib==3.7.0 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 mistune==0.8.4 ml-dtypes==0.2.0 mlflow-skinny==2.9.2 more-itertools==8.10.0 mpmath==1.2.1 msgpack==1.0.7 multidict==6.0.4 multimethod==1.10 multiprocess==0.70.14 murmurhash==1.0.10 mypy-extensions==0.4.3 nbclassic==0.5.2 nbclient==0.5.13 nbconvert==6.5.4 nbformat==5.7.0 nest-asyncio==1.5.6 networkx==2.8.4 ninja==1.11.1.1 nltk==3.7 nodeenv==1.8.0 notebook==6.5.2 notebook_shim==0.2.2 numba==0.56.4 numpy==1.23.5 nvidia-cublas-cu11==11.11.3.6 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.7.0.84 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu11==10.3.0.86 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu11==11.7.5.86 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu11==2.19.3 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu11==11.8.86 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.0 openai==0.28.1 opt-einsum==3.3.0 packaging==23.2 pandas==1.5.3 pandocfilters==1.5.0 paramiko==2.9.2 parso==0.8.3 pathspec==0.10.3 patsy==0.5.3 peft==0.10.0 petastorm==0.12.1 pexpect==4.8.0 phik==0.12.4 pickleshare==0.7.5 Pillow==9.4.0 platformdirs==2.5.2 plotly==5.9.0 pluggy==1.0.0 pmdarima==2.0.4 pooch==1.4.0 preshed==3.0.9 prompt-toolkit==3.0.36 prophet==1.1.5 protobuf==4.24.0 psutil==5.9.0 psycopg2==2.9.3 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==8.0.0 pyarrow-hotfix==0.5 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.11.1 pycparser==2.21 pydantic==1.10.6 pyflakes==3.1.0 Pygments==2.17.2 PyGObject==3.42.1 PyJWT==2.3.0 PyNaCl==1.5.0 pynvml==11.5.0 pyodbc==4.0.32 pyparsing==3.0.9 pyright==1.1.294 pyrsistent==0.18.0 pytesseract==0.3.10 python-apt==2.4.0+ubuntu3 python-dateutil==2.8.2 python-editor==1.0.4 python-lsp-jsonrpc==1.1.1 python-lsp-server==1.8.0 pytoolconfig==1.2.5 pytz==2022.7 PyWavelets==1.4.1 PyYAML==6.0 pyzmq==23.2.0 regex==2022.7.9 requests==2.28.1 requests-oauthlib==1.3.1 responses==0.18.0 rich==13.7.1 rope==1.7.0 rsa==4.9 s3transfer==0.6.2 safetensors==0.4.1 scikit-learn==1.1.1 scipy==1.10.0 seaborn==0.12.2 SecretStorage==3.3.1 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.99 shap==0.44.0 shtab==1.7.1 simplejson==3.17.6 six==1.16.0 slicer==0.0.7 smart-open==5.2.1 smmap==5.0.0 sniffio==1.2.0 soundfile==0.12.1 soupsieve==2.3.2.post1 soxr==0.3.7 spacy==3.7.2 spacy-legacy==3.0.12 spacy-loggers==1.0.5 spark-tensorflow-distributor==1.0.0 SQLAlchemy==1.4.39 sqlparse==0.4.2 srsly==2.4.8 ssh-import-id==5.11 stack-data==0.2.0 stanio==0.3.0 statsmodels==0.13.5 sympy==1.11.1 tabulate==0.8.10 tangled-up-in-unicode==0.2.0 tenacity==8.1.0 tensorboard==2.14.1 tensorboard-data-server==0.7.2 tensorboard-plugin-profile==2.14.0 tensorflow==2.14.1 tensorflow-estimator==2.14.0 tensorflow-io-gcs-filesystem==0.35.0 termcolor==2.4.0 terminado==0.17.1 thinc==8.2.2 threadpoolctl==2.2.0 tiktoken==0.5.2 tinycss2==1.2.1 tokenize-rt==4.2.1 tokenizers==0.19.1 tomli==2.0.1 torch==2.2.1+cu118 torchaudio==2.2.1+cu118 torchvision==0.17.1+cu118 tornado==6.1 tqdm==4.64.1 traitlets==5.7.1 transformers==4.40.1 triton==2.2.0 trl==0.8.6 typeguard==2.13.3 typer==0.9.0 typing-inspect==0.9.0 typing_extensions==4.11.0 tyro==0.8.3 ujson==5.4.0 unattended-upgrades==0.1 urllib3==1.26.14 virtualenv==20.16.7 visions==0.7.5 wadllib==1.3.6 wasabi==1.1.2 wcwidth==0.2.5 weasel==0.3.4 webencodings==0.5.1 websocket-client==0.58.0 Werkzeug==2.2.2 whatthepatch==1.0.2 widgetsnbextension==3.6.1 wordcloud==1.9.3 wrapt==1.14.1 xgboost==1.7.6 xxhash==3.4.1 yapf==0.33.0 yarl==1.9.4 ydata-profiling==4.2.0 zipp==3.11.0

I'm also using Databricks Runtime Version == 14.3 LTS ML with CUDA == 11.8.

This is the output of ds_report:

ds_report [2024-04-29 17:15:09,427] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH evoformer_attn ......... [NO] ....... [NO] fp_quantizer ........... [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/databricks/python3/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/databricks/python3/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.2, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 560.90 GB

As you can see, most of the stuff here has NOT been installed.

I've seen a lot of envs with which I'm supposed to install deepspeed like DS_BUILD_OPS=1, DS_BUILD_SPARSE_ATTN=0, DS_BUILD_AIO=1, etc but i'm really not sure which ones to use.

I'm entirely new to DeepSpeed, so I'd really appreciate your help, thanks!

loadams commented 6 months ago

Hi @vikram71198 - it looks like DeepSpeed is installed, what you are seeing is that you have not pre-compiled any ops. That's fine, you don't need to, the ops can be JIT compiled just fine. You probably don't need to pre-compile, but you can read more about that here and decide if you need to. If you do, determine what ops you will need and you can pre-compile those. Some ops have other dependencies, async_io, cutlass kernels, etc, that's why you see some envs with those disabled.

vikram71198 commented 6 months ago

Gotcha. I explicitly pip install torch == 2.2.1+cu118 (torch == 2.2.2+cu121 is the default torch which I attempt to override), so another part of ds_report that I find confounding is this:

DeepSpeed general environment info: torch install path ............... ['/databricks/python3/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/databricks/python3/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.2, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 560.90 GB

Why do torch version, torch cuda version & deepspeed compiled wheel all indicate torch == 2.2.2+cu121 & not 2.2.1+cu118?

The Databricks Cluster Runtime I'm currently using has CUDA == 11.8.

And yes, I run the torch installation before the DeepSpeed installation.

vikram71198 commented 6 months ago

Okay, I fixed this myself. Nvm.

loadams commented 6 months ago

So we can close this issue?

loadams commented 6 months ago

Hi @vikram71198 - I assume we can close this issue. If not, please comment and we can re-open.