hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.69k stars 4.34k forks source link

[BUG]: docker build cuda extension error #5732

Open apachemycat opened 4 months ago

apachemycat commented 4 months ago

Is there an existing issue for this bug?

πŸ› Describe the bug

when docker build run follow command RUN BUILD_EXT=1 pip install colossalai-nightly

RuntimeError: [extension] Could not find any kernel compatible with the current environment. but if I run this command in a container (with gpu flag to use GPU cards) then It suceed base image FROM nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu20.04

Environment

No response

Issues-translate-bot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Title: [BUG]: docker build cuda extension error

apachemycat commented 4 months ago

RUN BUILD_EXT=1 pip install colossalai-nightly:

0 7.666 Collecting colossalai-nightly

0 12.99 Downloading colossalai-nightly-2024.5.18.tar.gz (1.2 MB)

0 13.69 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 1.7 MB/s eta 0:00:00

0 14.21 Preparing metadata (setup.py): started

0 17.77 Preparing metadata (setup.py): finished with status 'error'

0 17.78 error: subprocess-exited-with-error

0 17.78

0 17.78 Γ— python setup.py egg_info did not run successfully.

0 17.78 β”‚ exit code: 1

0 17.78 ╰─> [7 lines of output]

0 17.78 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

0 17.78 Traceback (most recent call last):

0 17.78 File "", line 2, in

0 17.78 File "", line 34, in

0 17.78 File "/tmp/pip-install-4b2qtsp_/colossalai-nightly_41088cca51d34a4e95a34fd3ef65987c/setup.py", line 90, in

0 17.78 raise RuntimeError("[extension] Could not find any kernel compatible with the current environment.")

0 17.78 RuntimeError: [extension] Could not find any kernel compatible with the current environment.

0 17.78 [end of output]

0 17.78

apachemycat commented 4 months ago

one Created wheel for colossalai-nightly: filename=colossalai_nightly-2024.5.18-cp310-cp310-linux_x86_64.whl size=23673844 sha256=0a0bb55154c1ce9758ff8f9dd4b38e4b462647ad38e9714d9a4b2de6153b163e Stored in directory: /root/.cache/pip/wheels/ef/39/0e/39263ec364cb9d67240001279c9bcb1808b102252ea4ecaf33 Building wheel for contexttimer (setup.py) ... done Created wheel for contexttimer: filename=contexttimer-0.3.3-py3-none-any.whl size=5804 sha256=877270da42acb2811b2b5fbb097ce315895a4f6ed3b4da34aa5318a60c758006 Stored in directory: /root/.cache/pip/wheels/72/1c/da/cfd97201d88ccce214427fa84a5caeb91fef7c5a1b4c4312b4 Successfully built colossalai-nightly contexttimer Installing collected packages: ninja, distlib, contexttimer, wrapt, virtualenv, pydantic-core, nodeenv, msgpack, invoke, identify, cfgv, bcrypt, annotated-types, pynacl, pydantic, pre-commit, google, deprecated, cryptography, tokenizers, paramiko, transformers, ray, fabric, galore_torch, colossalai-nightly Attempting uninstall: tokenizers Found existing installation: tokenizers 0.19.1 Uninstalling tokenizers-0.19.1: Successfully uninstalled tokenizers-0.19.1 Attempting uninstall: transformers Found existing installation: transformers 4.42.0.dev0 Uninstalling transformers-4.42.0.dev0: Successfully uninstalled transformers-4.42.0.dev0 Successfully installed annotated-types-0.6.0 bcrypt-4.1.3 cfgv-3.4.0 colossalai-nightly-2024.5.18 contexttimer-0.3.3 cryptography-42.0.7 deprecated-1.2.14 distlib-0.3.8 fabric-3.2.2 galore_torch-1.0 google-3.0.0 identify-2.5.36 invoke-2.2.0 msgpack-1.0.8 ninja-1.11.1.1 nodeenv-1.8.0 paramiko-3.4.0 pre-commit-3.7.1 pydantic-2.7.1 pydantic-core-2.18.2 pynacl-1.5.0 ray-2.22.0 tokenizers-0.15.2 transformers-4.36.2 virtualenv-20.26.2 wrapt-1.16.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv root@71c5383a668b:/app# root@71c5383a668b:/app#

run in container with gpu device param

GrannyProgramming commented 3 months ago

workaround

Install ColossalAI from a specific commit

ARG VERSION=main RUN git clone -b ${VERSION} https://github.com/hpcaitech/ColossalAI.git && \ cd ColossalAI && \ git checkout 3e05c07bb8921f2a8f9736b6f6673d4e9f1697d0 && \ BUILD_EXT=1 pip install -v --no-cache-dir . && \ cd .. && \ rm -rf ColossalA

apachemycat commented 3 months ago

thanks

Issues-translate-bot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


thanks

ver217 commented 3 months ago

This is because docker buildkit is not compatible with current cuda extension. You can set export FORCE_CUDA=1 before install colossalai in docker. Or you can disable docker buildkit by setting export DOCKER_BUILDKIT=0