mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.21k stars 528 forks source link

Docker build failed #1815

Closed zhu-lingjie closed 2 months ago

zhu-lingjie commented 2 months ago

https://github.com/mlcommons/inference/issues/1740

Encountered same issue as above post. Had to manually change --user to -U , else cm would not be available globally, guess add the user path to PATH would be a more elegant fix.

#CM/repos/mlcommons@cm4mlops/script/build-dockerfile
 183   f.write('RUN {} -m pip install -U '.format(python) + " ".join(get_value(env, config, 'python-packages')) + ' ' + pip_extra_flags + ' ' + EOL)
184     #f.write('RUN {} -m pip install --user '.format(python) + " ".join(get_value(env, config, 'python-packages')) + ' ' + pip_extra_flags + ' ' + EOL)
arjunsuresh commented 2 months ago

Hi @zhu-lingjie, can you please share the commands to reproduce this issue?

zhu-lingjie commented 2 months ago

Hi Arjun, thanks for the reply.

Was following this guide https://docs.mlcommons.org/inference/benchmarks/image_classification/resnet50/

Command running were just

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 \
   --model=resnet50 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=1000
arjunsuresh commented 2 months ago

Thank you @zhu-lingjie for your reply. We actually do update the PATH variable here and we are not seeing this issue in our systems running docker. Trying to figure out what exactly in the docker environment is causing this issue. @anandhu-eng

zhu-lingjie commented 2 months ago
(mlperf) root@server1:~# cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 \
>    --model=resnet50 \
>    --implementation=nvidia \
>    --framework=tensorrt \
>    --category=datacenter \
>    --scenario=Offline \
>    --execution_mode=test \
>    --device=cuda  \
>    --docker --quiet \
>    --test_query_count=1000
INFO:root:* cm run script "run-mlperf inference _find-performance _full _r4.1"
INFO:root:  * cm run script "get mlcommons inference src"
INFO:root:       ! load /root/CM/repos/local/cache/b3545b27e1f441c9/cm-cached-state.json
INFO:root:  * cm run script "get sut description"
INFO:root:    * cm run script "detect os"
INFO:root:           ! cd /root
INFO:root:           ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root:    * cm run script "detect cpu"
INFO:root:      * cm run script "detect os"
INFO:root:             ! cd /root
INFO:root:             ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:             ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root:           ! cd /root
INFO:root:           ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py
INFO:root:    * cm run script "get python3"
INFO:root:         ! load /root/CM/repos/local/cache/3f0502fd21aa47f2/cm-cached-state.json
INFO:root:Path to Python: /root/mlperf/bin/python3
INFO:root:Python version: 3.9.6
INFO:root:    * cm run script "get compiler"
INFO:root:         ! load /root/CM/repos/local/cache/2ae6e0743fce43c4/cm-cached-state.json
INFO:root:    * cm run script "get cuda-devices"
INFO:root:      * cm run script "get cuda _toolkit"
INFO:root:           ! load /root/CM/repos/local/cache/872dcc5140ec4c27/cm-cached-state.json
INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no
INFO:root:ENV[CM_CUDA_VERSION]: 11.8
INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu118
INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /usr/local/cuda-11.8/bin/nvcc
INFO:root:ENV[CUDA_HOME]: /usr/local/cuda-11.8
INFO:root:           ! cd /root
INFO:root:           ! call /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/run.sh from tmp-run.sh
rm: cannot remove 'a.out': No such file or directory

Checking compiler version ...

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Compiling program ...

Running program ...

/root
INFO:root:           ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/customize.py
GPU Device ID: 0
GPU Name: NVIDIA A10
GPU compute capability: 8.6
CUDA driver version: 11.4
CUDA runtime version: 11.8
Global memory: 23836098560
Max clock rate: 1695.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

GPU Device ID: 1
GPU Name: NVIDIA A10
GPU compute capability: 8.6
CUDA driver version: 11.4
CUDA runtime version: 11.8
Global memory: 23836098560
Max clock rate: 1695.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

GPU Device ID: 2
GPU Name: NVIDIA A10
GPU compute capability: 8.6
CUDA driver version: 11.4
CUDA runtime version: 11.8
Global memory: 23836098560
Max clock rate: 1695.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

GPU Device ID: 3
GPU Name: NVIDIA A10
GPU compute capability: 8.6
CUDA driver version: 11.4
CUDA runtime version: 11.8
Global memory: 23836098560
Max clock rate: 1695.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

INFO:root:    * cm run script "get generic-python-lib _package.dmiparser"
INFO:root:         ! load /root/CM/repos/local/cache/28eda87d13e64b05/cm-cached-state.json
INFO:root:    * cm run script "get cache dir _name.mlperf-inference-sut-descriptions"
INFO:root:         ! load /root/CM/repos/local/cache/d347750851b34d88/cm-cached-state.json
Generating SUT description file for n123_099_212-tensorrt
HW description file for n123_099_212 not found. Copying from default!!!
INFO:root:         ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-sut-description/customize.py
INFO:root:  * cm run script "install pip-package for-cmind-python _package.tabulate"
INFO:root:       ! load /root/CM/repos/local/cache/46208aeb8a654ea4/cm-cached-state.json
INFO:root:  * cm run script "get mlperf inference utils"
INFO:root:    * cm run script "get mlperf inference src"
INFO:root:         ! load /root/CM/repos/local/cache/b3545b27e1f441c9/cm-cached-state.json
INFO:root:         ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /root/CM/repos/local/cache/247752b63d6b444d/inference

Running loadgen scenario: Offline and mode: performance
INFO:root:* cm run script "build dockerfile"

Dockerfile generated at /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile
INFO:root:* cm run script "get docker"
INFO:root:     ! load /root/CM/repos/local/cache/8145f6be5fe54dcb/cm-cached-state.json
INFO:root:* cm run script "get mlperf inference results dir local"
INFO:root:     ! load /root/CM/repos/local/cache/3250a9148c944d0d/cm-cached-state.json
INFO:root:* cm run script "get mlperf inference submission dir local"
INFO:root:     ! load /root/CM/repos/local/cache/816069e5752a4ec4/cm-cached-state.json
INFO:root:* cm run script "get dataset imagenet validation original _full"
INFO:root:     ! load /root/CM/repos/local/cache/bab4775a7d84497c/cm-cached-state.json
INFO:root:* cm run script "get nvidia-docker"
INFO:root:     ! load /root/CM/repos/local/cache/5ae4498f2e554b9e/cm-cached-state.json
INFO:root:* cm run script "get mlperf inference nvidia scratch space"
INFO:root:     ! load /root/CM/repos/local/cache/54e9b1bd57724b71/cm-cached-state.json
INFO:root:* cm run script "get nvidia-docker"
INFO:root:     ! load /root/CM/repos/local/cache/5ae4498f2e554b9e/cm-cached-state.json

CM command line regenerated to be used inside Docker:

cm run script --tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=resnet50 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=1000 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_SUT_DESC_CACHE=no --env.CM_TMP_CURRENT_PATH=/root --env.CM_TMP_PIP_VERSION_STRING= --env.CM_SUT_META_EXISTS=yes --env.CM_MODEL=resnet50 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True  --env.CM_DATASET_IMAGENET_PATH=/home/cmuser/CM/repos/local/cache/bab4775a7d84497c/imagenet-2012-val  --env.CM_MLPERF_INFERENCE_RESULTS_DIR=/home/cmuser/CM/repos/local/cache/3250a9148c944d0d  --env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/cmuser/CM/repos/local/cache/816069e5752a4ec4/mlperf-inference-submission  --env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/54e9b1bd57724b71  --docker_run_deps

INFO:root:* cm run script "run docker container"

Checking Docker images:

  docker images -q cknowledge/cm-script-app-mlperf-inference:ubuntu-20.04-latest 2> /dev/null

INFO:root:  * cm run script "build docker image"
================================================
CM generated the following Docker build command:

docker build  --build-arg GID=\" $(id -g $USER) \" --build-arg UID=\" $(id -u $USER) \" -f "/root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile" -t "cknowledge/cm-script-app-mlperf-inference:ubuntu-20.04-latest" .

INFO:root:         ! cd /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles
INFO:root:         ! call /root/CM/repos/mlcommons@cm4mlops/script/build-docker-image/run.sh from tmp-run.sh
[+] Building 128.8s (13/15)
 => [internal] load build definition from mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile                                         0.0s
 => => transferring dockerfile: 3.00kB                                                                                                                                0.0s
 => [internal] load metadata for nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public                                       1.2s
 => [internal] load .dockerignore                                                                                                                                     0.0s
 => => transferring context: 45B                                                                                                                                      0.0s
 => [ 1/12] FROM nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public@sha256:816bd0cc61b061c9b9e2ebecabd67f710b05740882f  110.8s
 => => resolve nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public@sha256:816bd0cc61b061c9b9e2ebecabd67f710b05740882f04a6  0.0s
 => => sha256:816bd0cc61b061c9b9e2ebecabd67f710b05740882f04a6a47ac5a5875635759 4.94kB / 4.94kB                                                                        0.0s
 => => sha256:34b056f25fae6827d26ca996fd5d92f40879487b1b82851d8e0ec55619e9b30f 23.03kB / 23.03kB                                                                      0.0s
 => => sha256:db26cf78ae4f895b1162fb506e79b7257fb2e39538a586d6634fe20f48cc60a5 7.94MB / 7.94MB                                                                        0.4s
 => => sha256:5adc7ab504d3aa2d75a0a9c265b66b194ddd891b0b311637307d7810a986c580 56.08MB / 56.08MB                                                                      1.6s
 => => sha256:7a2c559011895d255fce249c00396abff5ae7e0c0a92931d0ed493e71de78e3a 28.58MB / 28.58MB                                                                      0.5s
 => => sha256:e4f230263527ce207b7455b9476309d18a9f77f74e1f4b1fccda5852f531cd33 183B / 183B                                                                            0.5s
 => => extracting sha256:7a2c559011895d255fce249c00396abff5ae7e0c0a92931d0ed493e71de78e3a                                                                             0.4s
 => => sha256:95e3f492d47e010cc39b4aed8cd21d90bf77d820b0ab8f9785ca3e45d96fc074 6.88kB / 6.88kB                                                                        0.6s
 => => sha256:35dd1979297e8aea372ebc3f342857aea7be0b5a01595d2dab7c0c2165ae30c8 1.27GB / 1.27GB                                                                       19.3s
 => => sha256:39a2c88664b34d9fdc8d242048da75c8f639ef082c09606d67821e6ed34a5c4d 62.44kB / 62.44kB                                                                      0.8s
 => => sha256:d8f6b6cd09da3d00868412345089a2c2d5052ac966e64be8f2f9bf2725d202f8 1.68kB / 1.68kB                                                                        0.9s
 => => sha256:fe19bbed4a4aba883058d2b9f0d88541f6df0e621595a8acfe57d3d5768bb535 1.52kB / 1.52kB                                                                        1.0s
 => => extracting sha256:db26cf78ae4f895b1162fb506e79b7257fb2e39538a586d6634fe20f48cc60a5                                                                             0.1s
 => => sha256:469ef7e9efe03620c9534b342f7b67b56c68060526e8fc231a3e35eac45d8003 2.51GB / 2.51GB                                                                       38.1s
 => => extracting sha256:5adc7ab504d3aa2d75a0a9c265b66b194ddd891b0b311637307d7810a986c580                                                                             0.6s
 => => sha256:e30c6425f419a61244285b50bfff109c7253181c56c7241a645c0d688713721b 86.31kB / 86.31kB                                                                      1.8s
 => => sha256:386c2b257378b0e7bc5504f5df37e507e928970e02a8cbe7f639e1a047e5168b 816.70kB / 816.70kB                                                                    1.9s
 => => sha256:f84b877c1ffdb7bbd025c593c2157d44a37a5eb10efc43a59c6974ff421aa685 23.88MB / 23.88MB                                                                      2.3s
 => => extracting sha256:e4f230263527ce207b7455b9476309d18a9f77f74e1f4b1fccda5852f531cd33                                                                             0.0s
 => => extracting sha256:95e3f492d47e010cc39b4aed8cd21d90bf77d820b0ab8f9785ca3e45d96fc074                                                                             0.0s
 => => sha256:0d5b33ca9223071dd74902c245d511a1c339be04e993e44adb9cb24e4519696c 1.44GB / 1.44GB                                                                       38.4s
 => => extracting sha256:35dd1979297e8aea372ebc3f342857aea7be0b5a01595d2dab7c0c2165ae30c8                                                                            10.3s
 => => sha256:3385b0335001d0697203561db621220221f889d6cdeb9e4a47d9c39e362e9a0d 2.26GB / 2.26GB                                                                       89.8s
 => => extracting sha256:39a2c88664b34d9fdc8d242048da75c8f639ef082c09606d67821e6ed34a5c4d                                                                             0.0s
 => => extracting sha256:d8f6b6cd09da3d00868412345089a2c2d5052ac966e64be8f2f9bf2725d202f8                                                                             0.0s
 => => extracting sha256:fe19bbed4a4aba883058d2b9f0d88541f6df0e621595a8acfe57d3d5768bb535                                                                             0.0s
 => => extracting sha256:469ef7e9efe03620c9534b342f7b67b56c68060526e8fc231a3e35eac45d8003                                                                            19.3s
 => => sha256:0657b1e06d2b5f60304e74dbef7f1a83fff25bc11c8afccdf5b5e14418f52fc6 114B / 114B                                                                           38.2s
 => => sha256:aff060c995cc8e2c40f11b8f067a5348705067e07b10a2167540e2d652e2d72e 104.62kB / 104.62kB                                                                   38.3s
 => => sha256:985611f871749ba3dc321b265dec7061cc3f8a7a7f5a34ab6ed6660b31f9185c 1.24GB / 1.24GB                                                                       53.5s
 => => sha256:77ff06705ecb4b0000d113371e0dd574e52ed2df5b8260dcb7b5a90e907c4d08 85.25kB / 85.25kB                                                                     38.5s
 => => sha256:e20e10a5b338390da0c79618af5d930e9ad44d18f9573ac927b92de61d9c76db 6.54kB / 6.54kB                                                                       38.6s
 => => sha256:8f3828976b5b4cacf92310e5ff5758bc40508894f7351ed2cc50d0fd91c4c71f 284B / 284B                                                                           38.7s
 => => sha256:bc6d4c834fd8812a50e97409c1b4ceab7600789f7bf2ab6b1e914e355bc4895f 6.01kB / 6.01kB                                                                       38.8s
 => => extracting sha256:e30c6425f419a61244285b50bfff109c7253181c56c7241a645c0d688713721b                                                                             0.0s
 => => extracting sha256:386c2b257378b0e7bc5504f5df37e507e928970e02a8cbe7f639e1a047e5168b                                                                             0.0s
 => => extracting sha256:f84b877c1ffdb7bbd025c593c2157d44a37a5eb10efc43a59c6974ff421aa685                                                                             0.4s
 => => extracting sha256:0d5b33ca9223071dd74902c245d511a1c339be04e993e44adb9cb24e4519696c                                                                            11.5s
 => => extracting sha256:3385b0335001d0697203561db621220221f889d6cdeb9e4a47d9c39e362e9a0d                                                                            18.3s
 => => extracting sha256:0657b1e06d2b5f60304e74dbef7f1a83fff25bc11c8afccdf5b5e14418f52fc6                                                                             0.0s
 => => extracting sha256:aff060c995cc8e2c40f11b8f067a5348705067e07b10a2167540e2d652e2d72e                                                                             0.0s
 => => extracting sha256:985611f871749ba3dc321b265dec7061cc3f8a7a7f5a34ab6ed6660b31f9185c                                                                             2.7s
 => => extracting sha256:77ff06705ecb4b0000d113371e0dd574e52ed2df5b8260dcb7b5a90e907c4d08                                                                             0.0s
 => => extracting sha256:e20e10a5b338390da0c79618af5d930e9ad44d18f9573ac927b92de61d9c76db                                                                             0.0s
 => => extracting sha256:8f3828976b5b4cacf92310e5ff5758bc40508894f7351ed2cc50d0fd91c4c71f                                                                             0.0s
 => => extracting sha256:bc6d4c834fd8812a50e97409c1b4ceab7600789f7bf2ab6b1e914e355bc4895f                                                                             0.0s
 => [ 2/12] RUN apt-get update -y                                                                                                                                     4.6s
 => [ 3/12] RUN apt-get install -y python3 python3-pip git sudo wget python3-venv                                                                                     5.3s
 => [ 4/12] RUN ln -snf /usr/share/zoneinfo/US/Pacific /etc/localtime && echo US/Pacific >/etc/timezone                                                               0.7s
 => [ 5/12] RUN groupadd -g  0  -o cm                                                                                                                                 0.7s
 => [ 6/12] RUN useradd -m -u  0  -g  0  -o --create-home --shell /bin/bash cmuser                                                                                    0.8s
 => [ 7/12] RUN echo "cmuser ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers                                                                                                 0.6s
 => [ 8/12] WORKDIR /home/cmuser                                                                                                                                      0.1s
 => [ 9/12] RUN python3 -m pip install --user cmind requests giturlparse tabulate                                                                                     3.4s
 => ERROR [10/12] RUN cm pull repo gateoverflow@cm4mlops                                                                                                              0.7s
------
 > [10/12] RUN cm pull repo gateoverflow@cm4mlops:
#14 0.541 /bin/bash: cm: command not found
------
process "/bin/bash -c cm pull repo gateoverflow@cm4mlops" did not complete successfully: exit code: 127

CM error: Portable CM script failed (name = build-docker-image, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

Attaching the full log on a fresh machine for your reference, as mentioned it can be hacked, but hopefuly it can be fixed.

arjunsuresh commented 2 months ago

Thank you @zhu-lingjie. I could reproduce the issue while running cm (hence docker also) as root user. Though we don't recommend running CM as root, we'll fix this issue soon.

zhu-lingjie commented 2 months ago

Think it would be really beneficial for the general user, if more detailed guide and system requirements are provided.

Eg: I noticed python3 actually requires >=3.8
git version will require support for git switch, --> > 2.27.0 is required

IPv6 is not supported

Will try to run as nonroot user for future tests and see if issue persists, thank you.

arjunsuresh commented 2 months ago

Thank you @zhu-lingjie for your feedback. docker build should now work with even root user. Please do cm pull repo.

"python3 requires 3.8" - Most of the cm scripts do need python3.7+. Is there any CM script which didnt work with python3.7? We will add this to the docs.

git version - yes we'll add this to the documentation.

IPv6 - cm is not enforcing IPv4. Please do raise an issue if anything didn't work with IPv6.

zhu-lingjie commented 2 months ago
  1. Python3.7 error

    # pip install cm4mlops
    Collecting cm4mlops
    Cache entry deserialization failed, entry ignored
    Using cached https://files.pythonhosted.org/packages/ff/16/46b9f311bb802787dee1891a63612a5e7e2b0202983d4cc658164cca3f33/cm4mlops-0.2.tar.gz
    Installing build dependencies ... done
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-r532ou1j/cm4mlops/setup.py", line 40
        elif (spec := importlib.util.find_spec(name)) is not None:  --> syntax not valid for python3.7 
                   ^
    SyntaxError: invalid syntax
  2. IPv6 --> can look for wget -4, the -4 is actually not nessessary, but will cause problem for Ipv4 environments.

zhu-lingjie commented 2 months ago

Have tried to run docker with nonroot user, indeed will not run into the docker build issue, thanks for the fix though!