Open Bob123Yang opened 2 months ago
34.15 Unable to establish SSL connection.
Are you behind some proxy?
I didn't use any proxy by myself, but I'm in the company's internal network, but I think It quite quite common, right?
Could you help give some suggestions for this situation? Thanks.
Are you able to do
wget https://www.dropbox.com/s/92n2fyej3lzy3s3/caffe_ilsvrc12.tar.gz
on your shell?
I'm not next to the machine and will try it later......
cm) tomcat@tomcat-Dove-Product:~$ wget https://www.dropbox.com/s/92n2fyej3lzy3s3/caffe_ilsvrc12.tar.gz --2024-09-11 08:35:34-- https://www.dropbox.com/s/92n2fyej3lzy3s3/caffe_ilsvrc12.tar.gz Resolving www.dropbox.com (www.dropbox.com)... 31.13.94.37, 2a03:2880:f11f:83:face:b00c:0:25de Connecting to www.dropbox.com (www.dropbox.com)|31.13.94.37|:443... failed: Connection timed out. Connecting to www.dropbox.com (www.dropbox.com)|2a03:2880:f11f:83:face:b00c:0:25de|:443... failed: Network is unreachable. (cm) tomcat@tomcat-Lenovo-Product:~$
@arjunsuresh Is there any other method to prepare the package “caffe_ilsvrc12.tar.gz” for docker instead of downloading it from www.dropbox.com?
@Bob123Yang yes, we can find a way. But since this is not the only download in the workflow it'll be good to know what is happening. Is dropbox URLs blocked in your network? All other URLs are expected to work?
Yeah, it looks like that dropbox URLs is blocked here and the others seems good.
So how can I do to bypass this problem? I really don't want to be stopped by a download...
That's great. We have now added backup URL support in CM. Can you please do cm pull repo
and retry? For the docker run, please add --docker_cache=no
option to pull the latest changes.
@arjunsuresh I tried several times following your guide and please help review the log as below. It seems that no download problem but failed at clone every time at 14/14.
Alias: mlcommons@cm4mlops
Local path: /home/tomcat/CM/repos/mlcommons@cm4mlops
git pull
remote: Enumerating objects: 161, done. remote: Counting objects: 100% (161/161), done. remote: Compressing objects: 100% (67/67), done. remote: Total 161 (delta 107), reused 142 (delta 94), pack-reused 0 (from 0) Receiving objects: 100% (161/161), 66.41 KiB | 800.00 KiB/s, done. Resolving deltas: 100% (107/107), completed with 14 local objects. From https://github.com/mlcommons/cm4mlops be6b63f57..6ce857cab mlperf-inference -> origin/mlperf-inference
CM alias for this repository: mlcommons@cm4mlops
Reindexing all CM artifacts. Can take some time ... Took 0.6 sec.
INFO:root: cm run script "run-mlperf inference _find-performance _full _r4.1-dev" INFO:root: cm run script "get mlcommons inference src" INFO:root: ! load /home/tomcat/CM/repos/local/cache/c0c2d4df519a416f/cm-cached-state.json INFO:root: cm run script "install pip-package for-cmind-python _package.tabulate" INFO:root: ! load /home/tomcat/CM/repos/local/cache/2a4f3deecef34560/cm-cached-state.json INFO:root: cm run script "get mlperf inference utils" INFO:root: * cm run script "get mlperf inference src" INFO:root: ! load /home/tomcat/CM/repos/local/cache/c0c2d4df519a416f/cm-cached-state.json INFO:root: ! call "postprocess" from /home/tomcat/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-utils/customize.py Using MLCommons Inference source from /home/tomcat/CM/repos/local/cache/91cad0cc764a49d3/inference
Running loadgen scenario: Offline and mode: performance INFO:root:* cm run script "build dockerfile"
Dockerfile generated at /home/tomcat/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile INFO:root: cm run script "get docker" INFO:root: ! load /home/tomcat/CM/repos/local/cache/1c757c4f3d3e4a06/cm-cached-state.json INFO:root: cm run script "get mlperf inference results dir local" INFO:root: ! load /home/tomcat/CM/repos/local/cache/966e187bf39a46c8/cm-cached-state.json INFO:root: cm run script "get mlperf inference submission dir local" INFO:root: ! load /home/tomcat/CM/repos/local/cache/e880b27a4cf14bc8/cm-cached-state.json INFO:root: cm run script "get dataset imagenet validation original _full" INFO:root: ! load /home/tomcat/CM/repos/local/cache/87a60fb1d8344aeb/cm-cached-state.json INFO:root: cm run script "get nvidia-docker" INFO:root: ! load /home/tomcat/CM/repos/local/cache/f925db34327f4882/cm-cached-state.json INFO:root: cm run script "get mlperf inference nvidia scratch space" INFO:root: ! load /home/tomcat/CM/repos/local/cache/0cf60773c3484f98/cm-cached-state.json
CM command line regenerated to be used inside Docker:
cm run script --tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_test,_r4.1-dev_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=resnet50 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=edge --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=1000 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1-dev --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_TMP_CURRENT_PATH=/home/tomcat --env.CM_TMP_PIP_VERSION_STRING= --env.CM_MODEL=resnet50 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True --env.CM_DATASET_IMAGENET_PATH=/home/cmuser/CM/repos/local/cache/87a60fb1d8344aeb/imagenet-2012-val --env.CM_MLPERF_INFERENCE_RESULTS_DIR=/home/cmuser/CM/repos/local/cache/966e187bf39a46c8 --env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/cmuser/CM/repos/local/cache/e880b27a4cf14bc8/mlperf-inference-submission --env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/0cf60773c3484f98 --docker_run_deps
INFO:root:* cm run script "run docker container"
Checking Docker images:
docker images -q local/cm-script-app-mlperf-inference:ubuntu-20.04-latest 2> /dev/null
INFO:root: * cm run script "build docker image"
CM generated the following Docker build command:
docker build --no-cache --build-arg GID=\" $(id -g $USER) \" --build-arg UID=\" $(id -u $USER) \" -f "/home/tomcat/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile" -t "local/cm-script-app-mlperf-inference:ubuntu-20.04-latest" .
INFO:root: ! cd /home/tomcat/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles INFO:root: ! call /home/tomcat/CM/repos/mlcommons@cm4mlops/script/build-docker-image/run.sh from tmp-run.sh [+] Building 28772.6s (17/17) FINISHED docker:rootless => [internal] load build definition from mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfil 0.0s => => transferring dockerfile: 3.03kB 0.0s => WARN: SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ARG "CM_GH_TOKEN") (line 14) 0.0s => [internal] load metadata for nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-pub 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 45B 0.0s => CACHED [ 1/14] FROM nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public 0.0s => [ 2/14] RUN apt-get update -y 39.5s => [ 3/14] RUN apt-get install -y python3 python3-pip git sudo wget python3-venv 79.4s => [ 4/14] RUN ln -snf /usr/share/zoneinfo/US/Pacific /etc/localtime && echo US/Pacific >/etc/timezone 0.2s => [ 5/14] RUN groupadd -g 1001 -o cm 0.3s => [ 6/14] RUN useradd -m -u 1001 -g 1001 -o --create-home --shell /bin/bash cmuser 0.3s => [ 7/14] RUN echo "cmuser ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers 0.3s => [ 8/14] WORKDIR /home/cmuser 0.0s => [ 9/14] RUN python3 -m venv cm-venv 1.5s => [10/14] RUN . cm-venv/bin/activate 0.2s => [11/14] RUN python3 -m pip install --user cmind requests giturlparse tabulate 25.6s => [12/14] RUN cm pull repo mlcommons@cm4mlops --branch=mlperf-inference 29.4s => [13/14] RUN cm run script --tags=get,sys-utils-cm --quiet 524.9s => CANCELED [14/14] RUN cm run script --tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_test,_r4.1 28071.0s
1 warning found (use docker --debug to expand):
CM error: Portable CM script failed (name = build-docker-image, return code = 2)
^C
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that it is often a portability issue of a third-party tool or a native script wrapped and unified by this CM script (automation recipe). Please re-run this script with --repro flag and report this issue with the original command line, cm-repro directory and full log here:
https://github.com/mlcommons/cm4mlops/issues
The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native scripts more portable, interoperable and deterministic. Thank you!
INFO:root: cm run script "run-mlperf inference _find-performance _full _r4.1-dev" INFO:root: cm run script "get mlcommons inference src" INFO:root: ! load /home/tomcat/CM/repos/local/cache/c0c2d4df519a416f/cm-cached-state.json INFO:root: cm run script "install pip-package for-cmind-python _package.tabulate" INFO:root: ! load /home/tomcat/CM/repos/local/cache/2a4f3deecef34560/cm-cached-state.json INFO:root: cm run script "get mlperf inference utils" INFO:root: * cm run script "get mlperf inference src" INFO:root: ! load /home/tomcat/CM/repos/local/cache/c0c2d4df519a416f/cm-cached-state.json INFO:root: ! call "postprocess" from /home/tomcat/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-utils/customize.py Using MLCommons Inference source from /home/tomcat/CM/repos/local/cache/91cad0cc764a49d3/inference
Running loadgen scenario: Offline and mode: performance INFO:root:* cm run script "build dockerfile"
Dockerfile generated at /home/tomcat/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile INFO:root: cm run script "get docker" INFO:root: ! load /home/tomcat/CM/repos/local/cache/1c757c4f3d3e4a06/cm-cached-state.json INFO:root: cm run script "get mlperf inference results dir local" INFO:root: ! load /home/tomcat/CM/repos/local/cache/966e187bf39a46c8/cm-cached-state.json INFO:root: cm run script "get mlperf inference submission dir local" INFO:root: ! load /home/tomcat/CM/repos/local/cache/e880b27a4cf14bc8/cm-cached-state.json INFO:root: cm run script "get dataset imagenet validation original _full" INFO:root: ! load /home/tomcat/CM/repos/local/cache/87a60fb1d8344aeb/cm-cached-state.json INFO:root: cm run script "get nvidia-docker" INFO:root: ! load /home/tomcat/CM/repos/local/cache/f925db34327f4882/cm-cached-state.json INFO:root: cm run script "get mlperf inference nvidia scratch space" INFO:root: ! load /home/tomcat/CM/repos/local/cache/0cf60773c3484f98/cm-cached-state.json
CM command line regenerated to be used inside Docker:
cm run script --tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_test,_r4.1-dev_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=resnet50 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=edge --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=1000 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1-dev --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_TMP_CURRENT_PATH=/home/tomcat --env.CM_TMP_PIP_VERSION_STRING= --env.CM_MODEL=resnet50 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True --env.CM_DATASET_IMAGENET_PATH=/home/cmuser/CM/repos/local/cache/87a60fb1d8344aeb/imagenet-2012-val --env.CM_MLPERF_INFERENCE_RESULTS_DIR=/home/cmuser/CM/repos/local/cache/966e187bf39a46c8 --env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/cmuser/CM/repos/local/cache/e880b27a4cf14bc8/mlperf-inference-submission --env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/0cf60773c3484f98 --docker_run_deps
INFO:root:* cm run script "run docker container"
Checking Docker images:
docker images -q local/cm-script-app-mlperf-inference:ubuntu-20.04-latest 2> /dev/null
CM generated the following Docker build command:
docker build --build-arg GID=\" $(id -g $USER) \" --build-arg UID=\" $(id -u $USER) \" -f "/home/tomcat/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile" -t "local/cm-script-app-mlperf-inference:ubuntu-20.04-latest" .
INFO:root: ! cd /home/tomcat/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/dockerfiles INFO:root: ! call /home/tomcat/CM/repos/mlcommons@cm4mlops/script/build-docker-image/run.sh from tmp-run.sh [+] Building 79587.4s (16/17) docker:rootless [+] Building 79588.1s (16/17) docker:rootless [+] Building 79588.4s (16/17) docker:rootless [+] Building 79588.7s (16/17) docker:rootless [+] Building 79588.9s (16/17) docker:rootless [+] Building 79589.2s (16/17) docker:rootless [+] Building 79589.5s (16/17) docker:rootless [+] Building 79590.8s (16/17) docker:rootless [+] Building 79591.3s (16/17) docker:rootless [+] Building 79591.6s (16/17) docker:rootless [+] Building 79591.7s (16/17) docker:rootless [+] Building 79592.0s (16/17) docker:rootless [+] Building 79632.0s (17/17) FINISHED docker:rootless => [internal] load build definition from mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfil 0.0s => => transferring dockerfile: 3.03kB 0.0s => WARN: SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ARG "CM_GH_TOKEN") (line 14) 0.0s => [internal] load metadata for nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-pub 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 45B 0.0s => [ 1/14] FROM nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public 0.0s => CACHED [ 2/14] RUN apt-get update -y 0.0s => CACHED [ 3/14] RUN apt-get install -y python3 python3-pip git sudo wget python3-venv 0.0s => CACHED [ 4/14] RUN ln -snf /usr/share/zoneinfo/US/Pacific /etc/localtime && echo US/Pacific >/etc/timezone 0.0s => CACHED [ 5/14] RUN groupadd -g 1001 -o cm 0.0s => CACHED [ 6/14] RUN useradd -m -u 1001 -g 1001 -o --create-home --shell /bin/bash cmuser 0.0s => CACHED [ 7/14] RUN echo "cmuser ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers 0.0s => CACHED [ 8/14] WORKDIR /home/cmuser 0.0s => CACHED [ 9/14] RUN python3 -m venv cm-venv 0.0s => CACHED [10/14] RUN . cm-venv/bin/activate 0.0s => CACHED [11/14] RUN python3 -m pip install --user cmind requests giturlparse tabulate 0.0s => CACHED [12/14] RUN cm pull repo mlcommons@cm4mlops --branch=mlperf-inference 0.0s => CACHED [13/14] RUN cm run script --tags=get,sys-utils-cm --quiet 0.0s => CANCELED [14/14] RUN cm run script --tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_test,_r4.1 79632.0s
1 warning found (use docker --debug to expand):
CM error: Portable CM script failed (name = build-docker-image, return code = 2)
^C
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that it is often a portability issue of a third-party tool or a native script wrapped and unified by this CM script (automation recipe). Please re-run this script with --repro flag and report this issue with the original command line, cm-repro directory and full log here:
https://github.com/mlcommons/cm4mlops/issues
The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native scripts more portable, interoperable and deterministic. Thank you! (cm) tomcat@tomcat-Dove-Product:~$
Sorry, it should be said that building docker will stop at cloning the git at 14/14 for a long time over 12 hrs so that I have to stop the command by press "Ctrl + c".
will stop at cloning the git
Sorry, I'm unable to see this part in the shared output. Can you please share the number of cores and the RAM of the system?
Nvidia 4.0 code needs pytorch build from src and it typically takes around 2 hours on a 24 cores 64G system. If this is a problem, the best option is to use Nvidia 4.1 code which we are currently working on. Hope to make this available within a week.
tomcat@tomcat-Dove-Product:~$ lscpu | grep "socket\|Socket" Core(s) per socket: 56 Socket(s): 2 tomcat@tomcat-Dove-Product:~$ free -h total used free shared buff/cache available Mem: 125Gi 4.8Gi 119Gi 41Mi 1.4Gi 119Gi Swap: 49Gi 0B 49Gi tomcat@tomcat-Dove-Product:~$
Total 112 physical cores and 64G*2 memory.
@arjunsuresh Please refer to the running log as below (try it again today) that stopped at 21% of downloading resnet50_v1.onnx within docker building and last 21336.8s without any downloading progress.
(cm) tomcat@tomcat-Dove-Product:~$ cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \ --model=resnet50 \ --implementation=nvidia \ --framework=tensorrt \ --category=edge \ --scenario=Offline \ --execution_mode=test \ --device=cuda \ --docker --quiet \ --test_query_count=1000
I believe it could be a network issue - best to restart the command if download hangs like this. zenodo download is slow but it works 99% of the time as we have this resnet50 download in most of our github actions. Ideally this download should be over within a couple of minutes.
Yes, after several times of try run, resnet50 or other downloading passed but still stopped at the "Cloning into 'repo' ..." as before. (refer to the 1st picture)
I tried the command "git clone https://github.com/GATEOverflow/inference_results_v4.0.git --depth 5 repo" out of the docker and downloading is normal at first and last to about 20% of downloading progress and then the error prompted. (refer to the 2nd log)
The 1st picture:
The 2nd log:
tomcat@tomcat-Dove-Product:~/bobtry$ git clone https://github.com/GATEOverflow/inference_results_v4.0.git --depth 5 repo Cloning into 'repo'... remote: Enumerating objects: 71874, done. remote: Counting objects: 100% (71874/71874), done. remote: Compressing objects: 100% (33638/33638), done. error: RPC failed; curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8) error: 1705 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output tomcat@tomcat-Dove-Product:~/bobtry$
I think we should fix the download issue before proceeding with MLPerf runs as there are many more downloads needed. Since the clone is failing from github - may be best is to contact your system admin?
@Bob123Yang while testing across multiple systems we have pin pointed this error to the case where available network bandwidth is very low. One such case we have seen is while using rclone download, which chokes the network bandwidth affecting git clone of large repositories for any system on the same network.
Run the below cm commands for several times and always failed at the same place: