iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.83k stars 611 forks source link

Migrate GCS files to new (ideally public) locations #18518

Open ScottTodd opened 1 month ago

ScottTodd commented 1 month ago

We depend on a few files hosted in a GCP project using various buckets.

Most uses can be discovered in this repo with a regex search of https://storage\.googleapis\.com.*iree:

21 results - 13 files

.github\workflows\ci.yml:
  47    # attempt the setup step last ran in.
  48:   GCS_URL: https://storage.googleapis.com/iree-github-actions-${{ github.event_name == 'pull_request' && 'presubmit' || 'postsubmit' }}-artifacts/${{ github.run_id }}/${{ github.run_attempt }}
  49  

.github\workflows\pkgci_test_riscv64.yml:
  71          env:
  72:           IREE_ARTIFACT_URL: "https://storage.googleapis.com/iree-shared-files"
  73            RISCV_CLANG_TOOLCHAIN_FILE_NAME: "toolchain_iree_manylinux_2_28_20231012.tar.gz"

build_tools\docker\dockerfiles\base-arm64.Dockerfile:
  78  
  79: RUN wget --no-verbose "https://storage.googleapis.com/iree-shared-files/qemu-aarch64"
  80  RUN chmod +x ./qemu-aarch64 && cp ./qemu-aarch64 /usr/bin/qemu-aarch64 && rm -rf /install-qemu

build_tools\riscv\riscv_bootstrap.sh:
  14  PREBUILT_DIR="${HOME}/riscv"
  15: IREE_ARTIFACT_URL="https://storage.googleapis.com/iree-shared-files"
  16  

docs\website\docs\guides\ml-frameworks\tflite.md:
   98  WORKDIR="/tmp/workdir"
   99: TFLITE_URL="https://storage.googleapis.com/iree-model-artifacts/tflite-integration-tests/posenet_i8.tflite"
  100  TFLITE_PATH=${WORKDIR}/model.tflite

  152  ``` python
  153: tfliteUrl = "https://storage.googleapis.com/iree-model-artifacts/tflite-integration-tests/posenet_i8.tflite"
  154: jpgUrl = "https://storage.googleapis.com/iree-model-artifacts/tflite-integration-tests/posenet_i8_input.jpg"
  155  

experimental\web\generate_web_metrics.sh:
  75  
  76: wget -nc https://storage.googleapis.com/iree-model-artifacts/mobile_ssd_v2_float_coco.tflite
  77: wget -nc https://storage.googleapis.com/iree-model-artifacts/deeplabv3.tflite
  78: wget -nc https://storage.googleapis.com/iree-model-artifacts/posenet.tflite
  79: wget -nc https://storage.googleapis.com/iree-model-artifacts/mobilebert-baseline-tf2-float.tflite
  80: wget -nc https://storage.googleapis.com/iree-model-artifacts/mobilenet_v2_1.0_224.tflite
  81: wget -nc https://storage.googleapis.com/iree-model-artifacts/MobileNetV3SmallStaticBatch.tflite
  82  

integrations\tensorflow\test\python\iree_tfl_tests\imagenet_test_data.py:
   9      # We use an image of apples since this is an easy example.
  10:     img_path = "https://storage.googleapis.com/iree-model-artifacts/ILSVRC2012_val_00000023.JPEG"
  11      local_path = "/".join([workdir, "ILSVRC2012_val_00000023.JPEG"])

integrations\tensorflow\test\python\iree_tfl_tests\mobilebert_tf2_quant_test.py:
  8  # Source https://tfhub.dev/iree/lite-model/mobilebert/int8/1
  9: model_path = "https://storage.googleapis.com/iree-model-artifacts/mobilebert-baseline-tf2-quant.tflite"
  10  

integrations\tensorflow\test\python\iree_tfl_tests\mobilenet_v1_test.py:
  10  
  11: model_path = "https://storage.googleapis.com/iree-model-artifacts/tflite-integration-tests/mobilenet_v1.tflite"
  12  

integrations\tensorflow\test\python\iree_tfl_tests\mobilenet_v3-large_uint8_test.py:
  8  # Source https://tfhub.dev/iree/lite-model/mobilenet_v3_large_100_224/uint8/1
  9: model_path = "https://storage.googleapis.com/iree-model-artifacts/mobilenet_v3-large_224_1.0_uint8.tflite"
  10  

integrations\tensorflow\test\python\iree_tfl_tests\posenet_i8_test.py:
  13  
  14: model_path = "https://storage.googleapis.com/iree-model-artifacts/tflite-integration-tests/posenet_i8.tflite"
  15: model_input = "https://storage.googleapis.com/iree-model-artifacts/tflite-integration-tests/posenet_i8_input.jpg"
  16  

tests\e2e\stablehlo_models\mnist_train_test\mnist_train_test.py:
  21  
  22: MODEL_ARTIFACTS_URL = "https://storage.googleapis.com/iree-model-artifacts/mnist_train.2bec0cb356ae7c059e04624a627eb3b15b0a556cbd781bbed9f8d32e80a4311d.tar"
  23  

tests\e2e\stablehlo_models\mnist_train_test\README.md:
  22  sed -i \
  23:   "s|MODEL_ARTIFACTS_URL =.*|MODEL_ARTIFACTS_URL = \"https://storage.googleapis.com/iree-model-artifacts/mnist_train.${DIGEST}.tar\"|" \
  24    mnist_train_test.py

Those files are (as far as I can tell) only read from. They aren't written to, outside of very rare maintenance (none in the last year IIRC). There is a bucket that is read-write, used for ccache: http://storage.googleapis.com/iree-sccache/ccache. We are in the process of migrating off of that in https://github.com/iree-org/iree/issues/18238.

ScottTodd commented 1 month ago

Until we find a better location, let's at least download and then upload mirrors to Azure for the riscv and arm files (qemu-aarch64, toolchain_iree_manylinux_2_28_20231012.tar.gz, and any others)

ScottTodd commented 1 month ago

@Eliasj42 could you help mirror the https://storage.googleapis.com/iree-shared-files/qemu-aarch64 and toolchain_iree_manylinux_2_28_20231012.tar.gz files to the sharkpublic Azure storage account or some other public location? We can find a better long term home for those files later.

I'm less concerned about the .tflite files. We can just disable any tests relying on those.

Eliasj42 commented 1 month ago

toolchain_iree_manylinux_2_28_20231012.tar.gz

Like this? https://sharkpublic.blob.core.windows.net/sharkpublic/GCP-Migration-Files/qemu-aarch64 https://sharkpublic.blob.core.windows.net/sharkpublic/GCP-Migration-Files/toolchain_iree_manylinux_2_28_20231012.tar.gz

ScottTodd commented 1 month ago

Yep, then point the code to the new file locations.

ScottTodd commented 1 week ago

There are still a few places to update. A user just noted that the links in https://github.com/iree-org/iree/blob/main/build_tools/riscv/riscv_bootstrap.sh are dead.

We should also still switch to easier to manage files (git lfs?) with reproducible steps for generating them, instead of just mirroring to a cloud bucket that some project members have access to.