mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
4.02k stars 525 forks source link

Installation fails with ERROR: Invalid requirement: 'xentropy-cuda-lib@git...' #327

Closed olaf-beh closed 1 year ago

olaf-beh commented 1 year ago

TL;DR

Following installation instructions and Hardware and Software Requirements pip install -e ".[gpu]" fails with

ERROR: Invalid requirement: 'xentropy-cuda-lib@ git+https://github.com/HazyResearch/flash-attention.git@v1.0.3#subdirectory=csrc/xentropy; extra == "gpu"'

See below for details.

Environment

$ python collect_env.py 
Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: 11.0.1-2
CMake version: version 3.25.0
Libc version: glibc-2.31

Python version: 3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.16.0-0.bpo.4-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A100-SXM4-80GB

Nvidia driver version: 520.61.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.0.1+cu118
[pip3] torch-optimizer==0.3.0
[pip3] torchdata==0.6.1
[pip3] torchmetrics==0.11.3
[pip3] torchtext==0.15.2
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
[pip3] triton-pre-mlir==2.0.0
[conda] Could not collect

To reproduce

Steps to reproduce the behavior:

$ git clone https://github.com/mosaicml/llm-foundry.git
$ cd llm-foundry
$ python -m venv llmfoundry-venv
$ source llmfoundry-venv/bin/activate
(llmfoundry-venv) $ pip install -e ".[gpu]"
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///home/team/olaf/llm-foundry
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
    Preparing wheel metadata ... done
Collecting triton-pre-mlir@ git+https://github.com/vchiley/triton.git@triton_pre_mlir_sm90#subdirectory=python
  Cloning https://github.com/vchiley/triton.git (to revision triton_pre_mlir_sm90) to /tmp/pip-install-5yyrzy6b/triton-pre-mlir_1e502e78dff64454a657c63602a95d86
  Running command git clone -q https://github.com/vchiley/triton.git /tmp/pip-install-5yyrzy6b/triton-pre-mlir_1e502e78dff64454a657c63602a95d86
  Running command git checkout -b triton_pre_mlir_sm90 --track origin/triton_pre_mlir_sm90
  Switched to a new branch 'triton_pre_mlir_sm90'
  Branch 'triton_pre_mlir_sm90' set up to track remote branch 'triton_pre_mlir_sm90' from 'origin' by rebasing.
  Running command git submodule update --init --recursive -q
ERROR: Invalid requirement: 'xentropy-cuda-lib@ git+https://github.com/HazyResearch/flash-attention.git@v1.0.3#subdirectory=csrc/xentropy; extra == "gpu"'

Expected behavior

It seems that environment matches Hardware and Software Requirements (status supported).

I expect no error ;) when running pip install -e ".[gpu]" in supported environment.

Additional context

None

mvpatel2000 commented 1 year ago

hm... can you try running pip install 'xentropy-cuda-lib@git+https://github.com/HazyResearch/flash-attention.git@v1.0.3#subdirectory=csrc/xentropy'?

Can you also provide your pip version?

I just tested it on a fresh install with this docker image: mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04 and it seems to work fine.

olaf-beh commented 1 year ago

Thanks for fast reply :)

$ pip install packaging
$ pip install torch --index-url https://download.pytorch.org/whl/cu118
$ pip install 'xentropy-cuda-lib@git+https://github.com/HazyResearch/flash-attention.git@v1.0.3#subdirectory=csrc/xentropy'
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting xentropy-cuda-lib@ git+https://github.com/HazyResearch/flash-attention.git@v1.0.3#subdirectory=csrc/xentropy
  Cloning https://github.com/HazyResearch/flash-attention.git (to revision v1.0.3) to /tmp/pip-install-atemhwoo/xentropy-cuda-lib_284f6b2f333840f68d8b4c98c8a5e15c
  Running command git clone -q https://github.com/HazyResearch/flash-attention.git /tmp/pip-install-atemhwoo/xentropy-cuda-lib_284f6b2f333840f68d8b4c98c8a5e15c
  Running command git checkout -q 67ef5d28df71d395bc16787b31e08ea1afbe4178
  Running command git submodule update --init --recursive -q
Using legacy 'setup.py install' for xentropy-cuda-lib, since package 'wheel' is not installed.
Installing collected packages: xentropy-cuda-lib
    Running setup.py install for xentropy-cuda-lib ... done
Successfully installed xentropy-cuda-lib-0.1
$ pip install -e ".[gpu]"
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///home/team/olaf/llm-foundry
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
    Preparing wheel metadata ... done
Collecting triton-pre-mlir@ git+https://github.com/vchiley/triton.git@triton_pre_mlir_sm90#subdirectory=python
  Cloning https://github.com/vchiley/triton.git (to revision triton_pre_mlir_sm90) to /tmp/pip-install-881us9so/triton-pre-mlir_fd86e4e4127649169169a9f1fc522785
  Running command git clone -q https://github.com/vchiley/triton.git /tmp/pip-install-881us9so/triton-pre-mlir_fd86e4e4127649169169a9f1fc522785
  Running command git checkout -b triton_pre_mlir_sm90 --track origin/triton_pre_mlir_sm90
  Switched to a new branch 'triton_pre_mlir_sm90'
  Branch 'triton_pre_mlir_sm90' set up to track remote branch 'triton_pre_mlir_sm90' from 'origin' by rebasing.
  Running command git submodule update --init --recursive -q
ERROR: Invalid requirement: 'xentropy-cuda-lib@ git+https://github.com/HazyResearch/flash-attention.git@v1.0.3#subdirectory=csrc/xentropy; extra == "gpu"'

Hope that helps.

mvpatel2000 commented 1 year ago

Hm... could you please provide your pip and setuptools versions?

If normal pip install is working, I suspect this is some incompatibility with the packaging libraries. We don't do anything special in setup.py. It should just go through the install_requires we specify and install them...

olaf-beh commented 1 year ago
$ pip --version
pip 20.3.4 from /home/team/olaf/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/pip (python 3.9)
>>> import setuptools
>>> print(setuptools.__version__)
44.1.1
mvpatel2000 commented 1 year ago

Can you please try pip version 22.3.1, setuptools 59.5.0?

To be honest, I'm not super sure what the issue is. It seems to work on our docker images, so I am guessing it's related to some kind of environment setup... but I don't have any concrete leads.

olaf-beh commented 1 year ago

Thanks for your effort! I am offline now, but I'll try your suggestions tomorrow.

olaf-beh commented 1 year ago

Upgrade pip version 20.3.4 to 22.3.1 and setuptools version 44.1.1 to 59.5.0:

$ python -c 'import pip; print(pip.__version__)'
22.3.1
$ python -c 'import setuptools; print(setuptools.__version__)'
59.5.0

Starting pip install -e ".[gpu]" again:

$ pip install -e ".[gpu]"
...
Successfully installed Brotli-1.0.9 Click-8.1.3 GitPython-3.1.31 accelerate-0.19.0 aiohttp-3.8.4 aiosignal-1.3.1 antlr4-python3-runtime-4.9.3 apache-libcloud-3.7.0 appdirs-1.4.4 argcomplete-3.1.1 arrow-1.2.3 async-timeout-4.0.2 attrs-23.1.0 backoff-2.2.1 bcrypt-4.0.1 boto3-1.26.153 botocore-1.29.153 certifi-2023.5.7 cffi-1.15.1 charset-normalizer-3.1.0 circuitbreaker-1.4.0 coloredlogs-15.0.1 composer-0.14.1 contourpy-1.1.0 coolname-2.2.0 cryptography-39.0.2 cycler-0.11.0 datasets-2.10.1 decorator-5.1.1 dill-0.3.6 docker-6.1.3 docker-pycreds-0.4.0 einops-0.5.0 flash-attn-1.0.3.post0 flatbuffers-23.5.26 fonttools-4.40.0 frozenlist-1.3.3 fsspec-2023.6.0 gitdb-4.0.10 gql-3.4.1 graphql-core-3.2.3 huggingface-hub-0.15.1 humanfriendly-10.0 idna-3.4 importlib-metadata-6.6.0 importlib-resources-5.12.0 jmespath-1.0.1 kiwisolver-1.4.4 llm-foundry-0.1.0 markdown-it-py-3.0.0 matplotlib-3.7.1 mdurl-0.1.2 mosaicml-cli-0.4.8 mosaicml-streaming-0.4.1 multidict-6.0.4 multiprocess-0.70.14 numpy-1.24.3 oci-2.104.2 omegaconf-2.3.0 onnx-1.13.1 onnxruntime-1.14.1 packaging-22.0 pandas-2.0.2 paramiko-3.2.0 pathtools-0.1.2 pillow-9.5.0 prompt-toolkit-3.0.38 protobuf-3.20.3 psutil-5.9.5 py-cpuinfo-9.0.0 pyOpenSSL-23.2.0 pyarrow-12.0.1 pycparser-2.21 pygments-2.15.1 pynacl-1.5.0 pyparsing-3.0.9 python-dateutil-2.8.2 python-snappy-0.6.1 pytorch-ranger-0.1.1 pytz-2023.3 pyyaml-6.0 questionary-1.10.0 regex-2023.6.3 requests-2.31.0 responses-0.18.0 rich-13.4.2 ruamel.yaml-0.17.31 ruamel.yaml.clib-0.2.7 s3transfer-0.6.1 sentencepiece-0.1.97 sentry-sdk-1.25.1 setproctitle-1.3.2 six-1.16.0 slack-sdk-3.21.3 smmap-5.0.0 tabulate-0.9.0 tokenizers-0.13.3 torch-optimizer-0.3.0 torchdata-0.6.1 torchmetrics-0.11.3 torchtext-0.15.2 torchvision-0.15.2 tqdm-4.65.0 transformers-4.28.1 triton-pre-mlir-2.0.0 tzdata-2023.3 urllib3-1.26.16 validators-0.20.0 wandb-0.15.4 wcwidth-0.2.6 websocket-client-1.5.3 websockets-10.4 xxhash-3.2.0 yarl-1.9.2 zipp-3.15.0 zstd-1.5.5.1

I get some warnings of type DEPRECATION: flash-attn is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml ..., but apart from that, installation runs through smoothly.

Thanks a lot. Upgrading pip and setuptools seems to resolve installation issue Invalid requirement.