Open apachemycat opened 4 months ago
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
Title: [BUG]: docker build cuda extension error
RUN BUILD_EXT=1 pip install colossalai-nightly:
one Created wheel for colossalai-nightly: filename=colossalai_nightly-2024.5.18-cp310-cp310-linux_x86_64.whl size=23673844 sha256=0a0bb55154c1ce9758ff8f9dd4b38e4b462647ad38e9714d9a4b2de6153b163e Stored in directory: /root/.cache/pip/wheels/ef/39/0e/39263ec364cb9d67240001279c9bcb1808b102252ea4ecaf33 Building wheel for contexttimer (setup.py) ... done Created wheel for contexttimer: filename=contexttimer-0.3.3-py3-none-any.whl size=5804 sha256=877270da42acb2811b2b5fbb097ce315895a4f6ed3b4da34aa5318a60c758006 Stored in directory: /root/.cache/pip/wheels/72/1c/da/cfd97201d88ccce214427fa84a5caeb91fef7c5a1b4c4312b4 Successfully built colossalai-nightly contexttimer Installing collected packages: ninja, distlib, contexttimer, wrapt, virtualenv, pydantic-core, nodeenv, msgpack, invoke, identify, cfgv, bcrypt, annotated-types, pynacl, pydantic, pre-commit, google, deprecated, cryptography, tokenizers, paramiko, transformers, ray, fabric, galore_torch, colossalai-nightly Attempting uninstall: tokenizers Found existing installation: tokenizers 0.19.1 Uninstalling tokenizers-0.19.1: Successfully uninstalled tokenizers-0.19.1 Attempting uninstall: transformers Found existing installation: transformers 4.42.0.dev0 Uninstalling transformers-4.42.0.dev0: Successfully uninstalled transformers-4.42.0.dev0 Successfully installed annotated-types-0.6.0 bcrypt-4.1.3 cfgv-3.4.0 colossalai-nightly-2024.5.18 contexttimer-0.3.3 cryptography-42.0.7 deprecated-1.2.14 distlib-0.3.8 fabric-3.2.2 galore_torch-1.0 google-3.0.0 identify-2.5.36 invoke-2.2.0 msgpack-1.0.8 ninja-1.11.1.1 nodeenv-1.8.0 paramiko-3.4.0 pre-commit-3.7.1 pydantic-2.7.1 pydantic-core-2.18.2 pynacl-1.5.0 ray-2.22.0 tokenizers-0.15.2 transformers-4.36.2 virtualenv-20.26.2 wrapt-1.16.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv root@71c5383a668b:/app# root@71c5383a668b:/app#
run in container with gpu device param
workaround
ARG VERSION=main RUN git clone -b ${VERSION} https://github.com/hpcaitech/ColossalAI.git && \ cd ColossalAI && \ git checkout 3e05c07bb8921f2a8f9736b6f6673d4e9f1697d0 && \ BUILD_EXT=1 pip install -v --no-cache-dir . && \ cd .. && \ rm -rf ColossalA
thanks
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
thanks
This is because docker buildkit is not compatible with current cuda extension. You can set export FORCE_CUDA=1
before install colossalai in docker. Or you can disable docker buildkit by setting export DOCKER_BUILDKIT=0
Is there an existing issue for this bug?
π Describe the bug
when docker build run follow command RUN BUILD_EXT=1 pip install colossalai-nightly
RuntimeError: [extension] Could not find any kernel compatible with the current environment. but if I run this command in a container (with gpu flag to use GPU cards) then It suceed base image FROM nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu20.04
Environment
No response