mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
3.96k stars 519 forks source link

ERROR: Could not build wheels for flash-attn, xentropy-cuda-lib, which is required to install pyproject.toml-based projects on A6000 machine #189

Closed NarenZen closed 1 year ago

NarenZen commented 1 year ago

python version: 3.10.11 Screenshot from 2023-05-22 18-38-41

cuda version: 11.7 Screenshot from 2023-05-22 18-39-21

torch version: 1.13.1 Screenshot from 2023-05-22 18-40-28

ubuntu version: 20 Screenshot from 2023-05-22 18-46-36

Error: ERROR: Could not build wheels for flash-attn, xentropy-cuda-lib, which is required to install pyproject.toml-based projects

Traceback:

Building wheels for collected packages: flash-attn, llm-foundry, xentropy-cuda-lib
  Building wheel for flash-attn (setup.py): started
  Building wheel for flash-attn (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [86 lines of output]

      torch.__version__  = 1.13.1+cu117

      fatal: not a git repository (or any of the parent directories): .git
      running bdist_wheel
      /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/attention_kernl.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_triton_single_query.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_triton_tmp.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_triton_tmp_og.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_attn_triton_varlen.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      copying flash_attn/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn
      creating build/lib.linux-x86_64-cpython-310/flash_attn/layers
      copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
      copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
      copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
      creating build/lib.linux-x86_64-cpython-310/flash_attn/losses
      copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
      copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
      copying flash_attn/losses/cross_entropy_apex.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
      copying flash_attn/losses/cross_entropy_parallel.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
      creating build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
      creating build/lib.linux-x86_64-cpython-310/flash_attn/modules
      copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
      copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
      copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
      copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
      copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
      creating build/lib.linux-x86_64-cpython-310/flash_attn/ops
      copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
      copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
      copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
      copying flash_attn/ops/gelu_activation.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
      copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
      copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
      creating build/lib.linux-x86_64-cpython-310/flash_attn/triton
      copying flash_attn/triton/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/triton
      copying flash_attn/triton/fused_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn/triton
      creating build/lib.linux-x86_64-cpython-310/flash_attn/utils
      copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
      copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
      copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
      copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
      copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
      running build_ext
      building 'flash_attn_cuda' extension
      creating build/temp.linux-x86_64-cpython-310
      creating build/temp.linux-x86_64-cpython-310/csrc
      creating build/temp.linux-x86_64-cpython-310/csrc/flash_attn
      creating build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-install-3jsn5u61/flash-attn_e4b72fbc886948718634aecd0f2df907/csrc/flash_attn -I/tmp/pip-install-3jsn5u61/flash-attn_e4b72fbc886948718634aecd0f2df907/csrc/flash_attn/src -I/tmp/pip-install-3jsn5u61/flash-attn_e4b72fbc886948718634aecd0f2df907/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/fmha_api.cpp -o build/temp.linux-x86_64-cpython-310/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
      In file included from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/Device.h:4,
                       from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                       from /usr/local/lib/python3.10/dist-packages/torch/include/torch/extension.h:6,
                       from csrc/flash_attn/fmha_api.cpp:29:
      /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
         12 | #include <Python.h>
            |          ^~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
  Building editable for llm-foundry (pyproject.toml): started
  Building editable for llm-foundry (pyproject.toml): finished with status 'done'
  Created wheel for llm-foundry: filename=llm_foundry-0.1.0-0.editable-py3-none-any.whl size=10112 sha256=ab92a7d8672895fe97597071da90ae25f46dc84a84f9f1980968cd8db7210283
  Stored in directory: /tmp/pip-ephem-wheel-cache-c45qch1r/wheels/a3/12/c1/ee72aea08eca9e1d05a31b3e52fe292b1da91c5e96aefa4463
  Building wheel for xentropy-cuda-lib (setup.py): started
  Building wheel for xentropy-cuda-lib (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [23 lines of output]

      torch.__version__  = 1.13.1+cu117

      running bdist_wheel
      /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      running build
      running build_ext
      building 'xentropy_cuda_lib' extension
      creating build
      creating build/temp.linux-x86_64-cpython-310
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-install-3jsn5u61/xentropy-cuda-lib_ade08bb92b2e45eaaaf653920905388b/csrc/xentropy -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c interface.cpp -o build/temp.linux-x86_64-cpython-310/interface.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=xentropy_cuda_lib -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      In file included from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/Device.h:4,
                       from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                       from /usr/local/lib/python3.10/dist-packages/torch/include/torch/extension.h:6,
                       from interface.cpp:1:
      /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
         12 | #include <Python.h>
            |          ^~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for xentropy-cuda-lib
  Running setup.py clean for xentropy-cuda-lib
Successfully built llm-foundry
Failed to build flash-attn xentropy-cuda-lib
ERROR: Could not build wheels for flash-attn, xentropy-cuda-lib, which is required to install pyproject.toml-based projects

GPU device: A6000

hanlint commented 1 year ago

Hi @NarenZen , the error in question:

fatal error: Python.h: No such file or directory

Usually means that python-dev is not installed. See: https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory for more details.

vchiley commented 1 year ago

If you still have issues, I'd recommend trying the docker img in the requirements/install instructions.

ryurobin1990 commented 1 year ago

@vchiley I'm having the same issue when using the docker img, could you enlighten me where I might go wrong?

docker pull mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04

python -m venv llmfoundry-venv
source llmfoundry-venv/bin/activate

docker run mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04

pip install -U pip

pip install torch==1.13.1
pip install packaging

pip install -e ".[gpu]" 

My system: python 3.10, cuda 1.17 torch 1.13.1 ubuntu 20.04 GPU H100

NarenZen commented 1 year ago

Hi @NarenZen , the error in question:

fatal error: Python.h: No such file or directory

Usually means that python-dev is not installed. See: https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory for more details.

Worked for me. Thanks