Closed vikram71198 closed 6 months ago
Hi @vikram71198 - it looks like DeepSpeed is installed, what you are seeing is that you have not pre-compiled any ops. That's fine, you don't need to, the ops can be JIT compiled just fine. You probably don't need to pre-compile, but you can read more about that here and decide if you need to. If you do, determine what ops you will need and you can pre-compile those. Some ops have other dependencies, async_io, cutlass kernels, etc, that's why you see some envs with those disabled.
Gotcha. I explicitly pip install torch == 2.2.1+cu118
(torch == 2.2.2+cu121
is the default torch which I attempt to override), so another part of ds_report
that I find confounding is this:
DeepSpeed general environment info: torch install path ............... ['/databricks/python3/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/databricks/python3/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.2, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 560.90 GB
Why do torch version
, torch cuda version
& deepspeed compiled wheel
all indicate torch == 2.2.2+cu121
& not 2.2.1+cu118
?
The Databricks Cluster Runtime I'm currently using has CUDA == 11.8
.
And yes, I run the torch installation before the DeepSpeed installation.
Okay, I fixed this myself. Nvm.
So we can close this issue?
Hi @vikram71198 - I assume we can close this issue. If not, please comment and we can re-open.
I've been trying for a while to setup DeepSpeed on my Databricks cluster correctly, but have been largely unsuccessful in doing so.
Platform Specifications
absl-py==1.0.0 accelerate==0.29.3 aiohttp==3.9.1 aiosignal==1.3.1 anyio==3.5.0 appdirs==1.4.4 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 astor==0.8.1 asttokens==2.0.5 astunparse==1.6.3 async-timeout==4.0.3 attrs==22.1.0 audioread==3.0.1 azure-core==1.29.1 azure-cosmos==4.3.1 azure-storage-blob==12.19.0 azure-storage-file-datalake==12.14.0 backcall==0.2.0 bcrypt==3.2.0 beautifulsoup4==4.11.1 black==22.6.0 bleach==4.1.0 blinker==1.4 blis==0.7.11 boto3==1.24.28 botocore==1.27.96 cachetools==5.3.2 catalogue==2.0.10 category-encoders==2.6.3 certifi==2022.12.7 cffi==1.15.1 chardet==4.0.0 charset-normalizer==2.0.4 click==8.0.4 cloudpathlib==0.16.0 cloudpickle==2.0.0 cmake==3.28.1 cmdstanpy==1.2.0 comm==0.1.2 confection==0.1.4 configparser==5.2.0 contourpy==1.0.5 cryptography==39.0.1 cycler==0.11.0 cymem==2.0.8 Cython==0.29.32 dacite==1.8.1 databricks-automl-runtime==0.2.20 databricks-cli==0.18.0 databricks-feature-engineering==0.2.1 databricks-sdk==0.1.6 dataclasses-json==0.6.3 datasets==2.15.0 dbl-tempo==0.1.26 dbus-python==1.2.18 debugpy==1.6.7 decorator==5.1.1 deepspeed==0.14.2 defusedxml==0.7.1 dill==0.3.6 diskcache==5.6.3 distlib==0.3.7 distro==1.7.0 distro-info==1.1+ubuntu0.2 docstring-to-markdown==0.11 docstring_parser==0.16 einops==0.7.0 entrypoints==0.4 evaluate==0.4.1 executing==0.8.3 facets-overview==1.1.1 fastjsonschema==2.19.1 fasttext==0.9.2 filelock==3.9.0 flash-attn==2.5.7 Flask==2.2.5 flatbuffers==23.5.26 fonttools==4.25.0 frozenlist==1.4.1 fsspec==2023.6.0 future==0.18.3 gast==0.4.0 gensim==4.3.2 gitdb==4.0.11 GitPython==3.1.27 google-api-core==2.15.0 google-auth==2.21.0 google-auth-oauthlib==1.0.0 google-cloud-core==2.4.1 google-cloud-storage==2.11.0 google-crc32c==1.5.0 google-pasta==0.2.0 google-resumable-media==2.7.0 googleapis-common-protos==1.62.0 greenlet==2.0.1 grpcio==1.48.2 grpcio-status==1.48.1 gunicorn==20.1.0 gviz-api==1.10.0 h5py==3.7.0 hf_transfer==0.1.6 hjson==3.1.0 holidays==0.38 horovod==0.28.1 htmlmin==0.1.12 httplib2==0.20.2 huggingface-hub==0.21.3 idna==3.4 ImageHash==4.3.1 imbalanced-learn==0.11.0 importlib-metadata==4.11.3 importlib-resources==6.1.1 ipykernel==6.25.0 ipython==8.14.0 ipython-genutils==0.2.0 ipywidgets==7.7.2 isodate==0.6.1 itsdangerous==2.0.1 jedi==0.18.1 jeepney==0.7.1 Jinja2==3.1.2 jmespath==0.10.0 joblib==1.2.0 joblibspark==0.5.1 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.17.3 jupyter-client==7.3.4 jupyter-server==1.23.4 jupyter_core==5.2.0 jupyterlab-pygments==0.1.2 jupyterlab-widgets==1.0.0 keras==2.14.0 keyring==23.5.0 kiwisolver==1.4.4 langchain==0.0.348 langchain-core==0.0.13 langcodes==3.3.0 langsmith==0.0.79 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lazy_loader==0.3 libclang==15.0.6.1 librosa==0.10.1 lightgbm==4.1.0 lit==17.0.6 llvmlite==0.39.1 lxml==4.9.1 Mako==1.2.0 Markdown==3.4.1 markdown-it-py==3.0.0 MarkupSafe==2.1.1 marshmallow==3.20.2 matplotlib==3.7.0 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 mistune==0.8.4 ml-dtypes==0.2.0 mlflow-skinny==2.9.2 more-itertools==8.10.0 mpmath==1.2.1 msgpack==1.0.7 multidict==6.0.4 multimethod==1.10 multiprocess==0.70.14 murmurhash==1.0.10 mypy-extensions==0.4.3 nbclassic==0.5.2 nbclient==0.5.13 nbconvert==6.5.4 nbformat==5.7.0 nest-asyncio==1.5.6 networkx==2.8.4 ninja==1.11.1.1 nltk==3.7 nodeenv==1.8.0 notebook==6.5.2 notebook_shim==0.2.2 numba==0.56.4 numpy==1.23.5 nvidia-cublas-cu11==11.11.3.6 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.7.0.84 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu11==10.3.0.86 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu11==11.7.5.86 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu11==2.19.3 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu11==11.8.86 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.0 openai==0.28.1 opt-einsum==3.3.0 packaging==23.2 pandas==1.5.3 pandocfilters==1.5.0 paramiko==2.9.2 parso==0.8.3 pathspec==0.10.3 patsy==0.5.3 peft==0.10.0 petastorm==0.12.1 pexpect==4.8.0 phik==0.12.4 pickleshare==0.7.5 Pillow==9.4.0 platformdirs==2.5.2 plotly==5.9.0 pluggy==1.0.0 pmdarima==2.0.4 pooch==1.4.0 preshed==3.0.9 prompt-toolkit==3.0.36 prophet==1.1.5 protobuf==4.24.0 psutil==5.9.0 psycopg2==2.9.3 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==8.0.0 pyarrow-hotfix==0.5 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.11.1 pycparser==2.21 pydantic==1.10.6 pyflakes==3.1.0 Pygments==2.17.2 PyGObject==3.42.1 PyJWT==2.3.0 PyNaCl==1.5.0 pynvml==11.5.0 pyodbc==4.0.32 pyparsing==3.0.9 pyright==1.1.294 pyrsistent==0.18.0 pytesseract==0.3.10 python-apt==2.4.0+ubuntu3 python-dateutil==2.8.2 python-editor==1.0.4 python-lsp-jsonrpc==1.1.1 python-lsp-server==1.8.0 pytoolconfig==1.2.5 pytz==2022.7 PyWavelets==1.4.1 PyYAML==6.0 pyzmq==23.2.0 regex==2022.7.9 requests==2.28.1 requests-oauthlib==1.3.1 responses==0.18.0 rich==13.7.1 rope==1.7.0 rsa==4.9 s3transfer==0.6.2 safetensors==0.4.1 scikit-learn==1.1.1 scipy==1.10.0 seaborn==0.12.2 SecretStorage==3.3.1 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.99 shap==0.44.0 shtab==1.7.1 simplejson==3.17.6 six==1.16.0 slicer==0.0.7 smart-open==5.2.1 smmap==5.0.0 sniffio==1.2.0 soundfile==0.12.1 soupsieve==2.3.2.post1 soxr==0.3.7 spacy==3.7.2 spacy-legacy==3.0.12 spacy-loggers==1.0.5 spark-tensorflow-distributor==1.0.0 SQLAlchemy==1.4.39 sqlparse==0.4.2 srsly==2.4.8 ssh-import-id==5.11 stack-data==0.2.0 stanio==0.3.0 statsmodels==0.13.5 sympy==1.11.1 tabulate==0.8.10 tangled-up-in-unicode==0.2.0 tenacity==8.1.0 tensorboard==2.14.1 tensorboard-data-server==0.7.2 tensorboard-plugin-profile==2.14.0 tensorflow==2.14.1 tensorflow-estimator==2.14.0 tensorflow-io-gcs-filesystem==0.35.0 termcolor==2.4.0 terminado==0.17.1 thinc==8.2.2 threadpoolctl==2.2.0 tiktoken==0.5.2 tinycss2==1.2.1 tokenize-rt==4.2.1 tokenizers==0.19.1 tomli==2.0.1 torch==2.2.1+cu118 torchaudio==2.2.1+cu118 torchvision==0.17.1+cu118 tornado==6.1 tqdm==4.64.1 traitlets==5.7.1 transformers==4.40.1 triton==2.2.0 trl==0.8.6 typeguard==2.13.3 typer==0.9.0 typing-inspect==0.9.0 typing_extensions==4.11.0 tyro==0.8.3 ujson==5.4.0 unattended-upgrades==0.1 urllib3==1.26.14 virtualenv==20.16.7 visions==0.7.5 wadllib==1.3.6 wasabi==1.1.2 wcwidth==0.2.5 weasel==0.3.4 webencodings==0.5.1 websocket-client==0.58.0 Werkzeug==2.2.2 whatthepatch==1.0.2 widgetsnbextension==3.6.1 wordcloud==1.9.3 wrapt==1.14.1 xgboost==1.7.6 xxhash==3.4.1 yapf==0.33.0 yarl==1.9.4 ydata-profiling==4.2.0 zipp==3.11.0I'm also using
Databricks Runtime Version == 14.3 LTS ML
withCUDA == 11.8
.This is the output of
ds_report
:ds_report
[2024-04-29 17:15:09,427] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH evoformer_attn ......... [NO] ....... [NO] fp_quantizer ........... [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/databricks/python3/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/databricks/python3/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.2, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 560.90 GBAs you can see, most of the stuff here has NOT been installed.
I've seen a lot of envs with which I'm supposed to install deepspeed like
DS_BUILD_OPS=1
,DS_BUILD_SPARSE_ATTN=0
,DS_BUILD_AIO=1
, etc but i'm really not sure which ones to use.I'm entirely new to DeepSpeed, so I'd really appreciate your help, thanks!