Open RomanKoshkin opened 8 months ago
I followed the instructions to run from docker, but it errors with:
#0 38.37 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 138.6 MB/s eta 0:00:00 #0 38.41 Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB) #0 38.42 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.9/468.9 kB 86.0 MB/s eta 0:00:00 #0 38.45 Downloading coverage-7.4.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (233 kB) #0 38.46 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.2/233.2 kB 136.0 MB/s eta 0:00:00 #0 38.63 Building wheels for collected packages: rouge_score #0 38.63 Building wheel for rouge_score (setup.py): started #0 38.97 Building wheel for rouge_score (setup.py): finished with status 'done' #0 38.97 Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24952 sha256=46020c2db4d82ec583ef1db13c75303e7066235749b3d38b6f6cf6cb1b35a60d #0 38.97 Stored in directory: /tmp/pip-ephem-wheel-cache-im8l787c/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4 #0 38.98 Successfully built rouge_score #0 41.11 Installing collected packages: tokenizers, sentencepiece, distlib, xxhash, virtualenv, typing-extensions, safetensors, pyproject_hooks, pynvml, pybind11-stubgen, pyarrow-hotfix, py, parameterized, nodeenv, nltk, mypy-extensions, lark, identify, humanfriendly, graphviz, dill, cuda-python, coverage, colored, cfgv, rouge_score, responses, pytest-forked, pre-commit, mypy, multiprocess, huggingface-hub, coloredlogs, build, transformers, pytest-cov, diffusers, accelerate, datasets, optimum, evaluate #0 41.48 Attempting uninstall: typing-extensions #0 41.49 Found existing installation: typing_extensions 4.7.1 #0 41.49 Uninstalling typing_extensions-4.7.1: #0 41.88 Successfully uninstalled typing_extensions-4.7.1 #0 41.97 Attempting uninstall: pynvml #0 41.97 Found existing installation: pynvml 11.4.1 #0 41.98 Uninstalling pynvml-11.4.1: #0 41.98 Successfully uninstalled pynvml-11.4.1 #0 43.11 Attempting uninstall: cuda-python #0 43.11 Found existing installation: cuda-python 12.2.0rc5+5.g84845d1 #0 43.13 Uninstalling cuda-python-12.2.0rc5+5.g84845d1: #0 43.18 Successfully uninstalled cuda-python-12.2.0rc5+5.g84845d1 #0 49.50 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. #0 49.50 dask-cuda 23.8.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.0 which is incompatible. #0 49.50 torch-tensorrt 0.0.0 requires tensorrt<8.7,>=8.6, but you have tensorrt 9.2.0.post12.dev5 which is incompatible. #0 49.50 Successfully installed accelerate-0.20.3 build-1.0.3 cfgv-3.4.0 colored-2.2.4 coloredlogs-15.0.1 coverage-7.4.0 cuda-python-12.2.0 datasets-2.16.1 diffusers-0.15.0 dill-0.3.7 distlib-0.3.8 evaluate-0.4.1 graphviz-0.20.1 huggingface-hub-0.20.1 humanfriendly-10.0 identify-2.5.33 lark-1.1.8 multiprocess-0.70.15 mypy-1.8.0 mypy-extensions-1.0.0 nltk-3.8.1 nodeenv-1.8.0 optimum-1.16.1 parameterized-0.9.0 pre-commit-3.6.0 py-1.11.0 pyarrow-hotfix-0.6 pybind11-stubgen-2.4.2 pynvml-11.5.0 pyproject_hooks-1.0.0 pytest-cov-4.1.0 pytest-forked-1.6.0 responses-0.18.0 rouge_score-0.1.2 safetensors-0.4.1 sentencepiece-0.1.99 tokenizers-0.13.3 transformers-4.33.1 typing-extensions-4.8.0 virtualenv-20.25.0 xxhash-3.4.1 #0 49.50 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv #0 50.26 #0 50.26 [notice] A new release of pip is available: 23.2.1 -> 23.3.2 #0 50.26 [notice] To update, run: python3 -m pip install --upgrade pip #0 51.28 -- The CXX compiler identification is GNU 11.4.0 #0 51.32 -- Detecting CXX compiler ABI info #0 51.49 -- Detecting CXX compiler ABI info - done #0 51.52 -- Check for working CXX compiler: /usr/bin/c++ - skipped #0 51.52 -- Detecting CXX compile features #0 51.52 -- Detecting CXX compile features - done #0 51.52 -- NVTX is disabled #0 51.52 -- Importing batch manager #0 51.52 -- Building PyTorch #0 51.52 -- Building Google tests #0 51.52 -- Building benchmarks #0 51.52 -- Looking for a CUDA compiler #0 53.89 -- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc #0 53.89 -- CUDA compiler: /usr/local/cuda/bin/nvcc #0 53.93 -- GPU architectures: 90-real,89-real #0 53.96 CMake Error at /usr/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:279 (message): #0 53.96 CMAKE_CUDA_ARCHITECTURES: #0 53.96 #0 53.96 90-real,89-real #0 53.96 #0 53.96 is not one of the following: #0 53.96 #0 53.96 * a semicolon-separated list of integers, each optionally #0 53.96 followed by '-real' or '-virtual' #0 53.96 * a special value: all, all-major, native #0 53.96 #0 53.96 Call Stack (most recent call first): #0 53.96 CMakeLists.txt:140 (enable_language) #0 53.96 #0 53.96 #0 53.96 -- Configuring incomplete, errors occurred! #0 53.98 Traceback (most recent call last): #0 53.98 File "/src/tensorrt_llm/scripts/build_wheel.py", line 307, in <module> #0 53.98 main(**vars(args)) #0 53.98 File "/src/tensorrt_llm/scripts/build_wheel.py", line 160, in main #0 53.98 build_run( #0 53.98 File "/usr/lib/python3.10/subprocess.py", line 526, in run #0 53.98 raise CalledProcessError(retcode, process.args, #0 53.98 subprocess.CalledProcessError: Command 'cmake -DCMAKE_BUILD_TYPE="Release" -DBUILD_PYT="ON" -DBUILD_PYBIND="OFF" "-DCMAKE_CUDA_ARCHITECTURES=90-real,89-real" -DTRT_LIB_DIR=/usr/local/tensorrt/targets/x86_64-linux-gnu/lib -DTRT_INCLUDE_DIR=/usr/local/tensorrt/include -S "/src/tensorrt_llm/cpp"' returned non-zero exit status 1. ------ Dockerfile.multi:59 -------------------- 57 | 58 | ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt" 59 | >>> RUN python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS} 60 | 61 | FROM devel as release -------------------- ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1 Makefile:51: recipe for target 'release_build' failed make: *** [release_build] Error 1
I followed the instructions to run from docker, but it errors with: