mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) with a human-friendly interface and minimal dependencies to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware (cloud/edge)
http://docs.mlcommons.org/cm4mlops/
Apache License 2.0
9 stars 12 forks source link

Improve handling for Coco2014 dataset download #162

Closed anandhu-eng closed 14 hours ago

anandhu-eng commented 1 month ago

If a user attempts to download the COCO 2014 dataset and encounters an issue that results in an incomplete download, the file annotations_trainval2014.zip may remain in the cache folder. When the user tries to download the dataset again, the new download will be saved as annotations_trainval2014.zip.1 because the original file already exists. In this line of inference code, the code attempts to extract annotations_trainval2014.zip, the following error occurs:


INFO:root:  * cm run script "app mlperf reference inference _cuda _sdxl _offline _pytorch _float16"
INFO:root:    * cm run script "detect os"
INFO:root:           ! cd /home/anandhu/CM/repos/local/cache/60b12703a24c431d
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:    * cm run script "detect cpu"
INFO:root:      * cm run script "detect os"
INFO:root:             ! cd /home/anandhu/CM/repos/local/cache/60b12703a24c431d
INFO:root:             ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:           ! cd /home/anandhu/CM/repos/local/cache/60b12703a24c431d
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-cpu/customize.py
INFO:root:    * cm run script "get sys-utils-cm"
WARNING:root:=================================================
WARNING:root:WARNINGS:
WARNING:root:  This CM script will install extra OS system utils required for CM automation workflows!
WARNING:root:=================================================
INFO:root:      * cm run script "detect os"
INFO:root:             ! cd /home/anandhu/CM/repos/local/cache/60b12703a24c431d
INFO:root:             ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:    * cm run script "get python"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/6b57feaa93e3487b/cm-cached-state.json
INFO:root:Path to Python: /home/anandhu/CM/repos/local/cache/f157ec25f16b493b/install/bin/python3
INFO:root:Python version: 3.10.13
INFO:root:    * cm run script "get cuda _cudnn"
INFO:root:      * cm run script "detect os"
INFO:root:             ! cd /home/anandhu/CM/repos/local/cache/acade10c4e084e24
INFO:root:             ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:        # Requested paths: /home/anandhu/test/bin:/home/anandhu/.local/bin:/home/anandhu/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin:/usr/cuda/bin:/usr/local/cuda-11/bin:/usr/cuda-11/bin:/usr/local/cuda-12/bin:/usr/cuda-12/bin:/usr/local/packages/cuda
INFO:root:      - Searching for versions:  == 12.4.1
INFO:root:        * /usr/bin/nvcc
INFO:root:               ! cd /home/anandhu/CM/repos/local/cache/acade10c4e084e24
INFO:root:               ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda/run.sh from tmp-run.sh
INFO:root:               ! call "detect_version" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda/customize.py
        Detected version: 12.0
INFO:root:        SKIPPED due to version constraints ...
INFO:root:        * /usr/local/cuda/bin/nvcc
INFO:root:               ! cd /home/anandhu/CM/repos/local/cache/acade10c4e084e24
INFO:root:               ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda/run.sh from tmp-run.sh
INFO:root:               ! call "detect_version" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda/customize.py
        Detected version: 12.2
INFO:root:        SKIPPED due to version constraints ...
INFO:root:      * cm run script "install cuda prebuilt"
INFO:root:           ! load /home/anandhu/CM/repos/local/cache/378abecbd7864b87/cm-cached-state.json
INFO:root:           ! cd /home/anandhu/CM/repos/local/cache/acade10c4e084e24
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cuda/customize.py
        Detected version: 12.4
INFO:root:    * cm run script "get nvidia cudnn"
INFO:root:      * cm run script "detect os"
INFO:root:             ! cd /home/anandhu/CM/repos/local/cache/4a74be41b5b0498e
INFO:root:             ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:        # Requested paths: /home/anandhu/test/bin:/home/anandhu/.local/bin:/home/anandhu/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin:/usr/cuda/bin:/usr/local/cuda-11/bin:/usr/cuda-11/bin:/usr/local/cuda-12/bin:/usr/cuda-12/bin:/usr/local/packages/cuda:/usr/local/cuda/lib64:/usr/cuda/lib64:/usr/local/cuda/lib:/usr/cuda/lib:/usr/local/cuda-11/lib64:/usr/cuda-11/lib:/usr/local/cuda-12/lib:/usr/cuda-12/lib:/usr/local/packages/cuda/lib:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/lib64:/usr/lib64:/usr/local/lib:/lib:/usr/lib
INFO:root:        # Found artifact in /lib/x86_64-linux-gnu/libcudnn.so
INFO:root:           ! cd /home/anandhu/CM/repos/local/cache/4a74be41b5b0498e
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cudnn/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-cudnn/customize.py
INFO:root:ENV[CM_CUDA_PATH_INCLUDE_CUDNN]: /usr/include
INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN]: /lib/x86_64-linux-gnu
INFO:root:ENV[CM_CUDNN_VERSION]: 9.1.1
INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: yes
INFO:root:ENV[CM_CUDA_VERSION]: 12.4
INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu124
INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /home/anandhu/CM/repos/local/cache/378abecbd7864b87/install/bin/nvcc
INFO:root:ENV[CUDA_HOME]: /home/anandhu/CM/repos/local/cache/378abecbd7864b87/install
INFO:root:    * cm run script "get generic-python-lib _torch_cuda"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/2b362e3e9d334120/cm-cached-state.json
INFO:root:    * cm run script "get generic-python-lib _torchvision_cuda"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/b668bf168e564a56/cm-cached-state.json
INFO:root:    * cm run script "get ml-model stable-diffusion text-to-image sdxl raw _pytorch _fp32"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/a9fc76dc76a34676/cm-cached-state.json
INFO:root:Stable diffusion checkpoint path: /home/anandhu/CM/repos/local/cache/0850cdc6e5454f4b/stable_diffusion_fp32
INFO:root:    * cm run script "get dataset coco2014 _validation _full"
INFO:root:      * cm run script "get python3"
INFO:root:           ! load /home/anandhu/CM/repos/local/cache/6b57feaa93e3487b/cm-cached-state.json
INFO:root:Path to Python: /home/anandhu/CM/repos/local/cache/f157ec25f16b493b/install/bin/python3
INFO:root:Python version: 3.10.13
INFO:root:      * cm run script "get generic-python-lib _package.tqdm"
INFO:root:           ! load /home/anandhu/CM/repos/local/cache/2c9602556f684c08/cm-cached-state.json
INFO:root:      * cm run script "get generic-python-lib _package.pandas"
INFO:root:           ! load /home/anandhu/CM/repos/local/cache/56c14a32cc5348ba/cm-cached-state.json
INFO:root:      * cm run script "mlperf inference source"
INFO:root:        * cm run script "detect os"
INFO:root:               ! cd /home/anandhu/CM/repos/local/cache/ed7521b324984067
INFO:root:               ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:               ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:        * cm run script "get python3"
INFO:root:             ! load /home/anandhu/CM/repos/local/cache/6b57feaa93e3487b/cm-cached-state.json
INFO:root:Path to Python: /home/anandhu/CM/repos/local/cache/f157ec25f16b493b/install/bin/python3
INFO:root:Python version: 3.10.13
INFO:root:        * cm run script "get git repo _branch.master _repo.https://github.com/mlcommons/inference"
INFO:root:             ! load /home/anandhu/CM/repos/local/cache/c451e090cdb24951/cm-cached-state.json
INFO:root:CM cache path to the Git repo: /home/anandhu/CM/repos/local/cache/c451e090cdb24951/inference
INFO:root:             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-src/customize.py
Using MLCommons Inference source from '/home/anandhu/CM/repos/local/cache/c451e090cdb24951/inference'
INFO:root:           ! cd /home/anandhu/CM/repos/local/cache/7ba1c96646a8438f
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-dataset-coco2014/run.sh from tmp-run.sh
./download-coco-2014.sh -d /home/anandhu/CM/repos/local/cache/7ba1c96646a8438f/install
--2024-08-16 11:41:14--  http://images.cocodataset.org/annotations/annotations_trainval2014.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 3.5.13.29, 54.231.167.73, 52.216.61.81, ...
Connecting to images.cocodataset.org (images.cocodataset.org)|3.5.13.29|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 252872794 (241M) [application/zip]
Saving to: ‘annotations_trainval2014.zip.2’

annotations_trainva 100%[===================>] 241.16M  4.20MB/s    in 71s     

2024-08-16 11:42:25 (3.41 MB/s) - ‘annotations_trainval2014.zip.2’ saved [252872794/252872794]

Traceback (most recent call last):
  File "/home/anandhu/CM/repos/local/cache/c451e090cdb24951/inference/text_to_image/tools/coco.py", line 94, in <module>
    with zipfile.ZipFile(
  File "/home/anandhu/CM/repos/local/cache/f157ec25f16b493b/install/lib/python3.10/zipfile.py", line 1269, in __init__
    self._RealGetContents()
  File "/home/anandhu/CM/repos/local/cache/f157ec25f16b493b/install/lib/python3.10/zipfile.py", line 1336, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

CM error: Portable CM script failed (name = get-dataset-coco2014, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script 
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts 
to make existing tools and native scripts more portable, interoperable 
and deterministic. Thank you!

Potential fix:

We could give a check statement here and delete annotations_trainval2014.zip if it exists.

arjunsuresh commented 4 days ago

Yes @anandhu-eng the potential fix is good. As the download happens inside the Inference python code we don't have a better way here.

anandhu-eng commented 14 hours ago

Closing the issue as https://github.com/GATEOverflow/cm4mlops/pull/116 is merged.