microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.39k stars 2.89k forks source link

[Build] Remove large files from repository #20526

Open mc-nv opened 5 months ago

mc-nv commented 5 months ago

Describe the issue

Observing that repository checkout can consume a lot of time. Due to model files examples stored in the repository. Those files can be outdated to the project branch, but do increase checkout time for the repository.

Urgency

I would say it's urgent as it impact many users and also will block/impact the https://github.com/microsoft/onnxruntime/issues/12081

Target platform

any

Build script

reproduce steps

$ git clone https://github.com/microsoft/onnxruntime.git
$ cd onxxruntime
$ git rev-list --objects --all |   git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |   sed -n 's/^blob //p' |   sort --numeric-sort --key=2 |   cut -c 1-12,41- |   $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

Error / output

...
4b82f1d9cd30   19MiB onnxruntime/python/tools/quantization/E2E_example_model/object_detection/trt/yolov3/annotations/instances_val2017.json
618e8a8acc50   20MiB orttraining/orttraining/models/bert_tiny/bert-tiny_1-layer_noloss.onnx
64d138c6d30a   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
718b0d93c1a5   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
67936bb7b3d2   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
28d50361c4e2   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
1dd70726be37   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
295c165101f1   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
bb0f72efd2ea   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
150184ba7698   20MiB onnxruntime/test/testdata/transform/bert_toy_opset14.onnx
ba50963637c6   27MiB onnxruntime/test/testdata/bart_tiny.onnx
e22f27348f9a   31MiB onnx_test_runner_armv8a_flag.zip
070d5d4f066a   31MiB winml/test/scenario/models/coreml_Resnet50_ImageNet-dq.onnx
f3e82ef30be6   33MiB images/bert-excel.gif
2af37a459364   34MiB onnxruntime/python/tools/transformers/benchmark_autosuggest_LM/dlis/cublasLt64_10.dll
20bcfbfc2184   35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.1.3.ort.pt
a68114bc465d   35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.2.3.ort.pt
bf978748d4b5   35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.0.3.ort.pt
5f13eebf892e   39MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.3.3.ort.pt
13678207f109   71MiB onnxruntime/python/tools/transformers/benchmark_autosuggest_LM/dlis/cublas64_10.dll
5afd272b5fff   74MiB onnxruntime/test/contrib_ops/qordered_python_test/my_model/const16_longformer.embeddings.word_embeddings.weight.npy
38e731a65948   87MiB test/ssd/ssd.onnx
4315eb99ea53   97MiB onnxruntime/python/tools/quantization/E2E_example_model/resnet50_v1.onnx
53ac9b3d567a   98MiB onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/resnet50-v1-9.onnx
bbed42bb5ea3   98MiB onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/resnet50-v1-13.onnx

Visual Studio Version

No response

GCC / Compiler Version

No response

jywu-msft commented 5 months ago

can you do a shallow clone (i.e. --depth 1) to reduce the time? but agreed, we should do more on our side to avoid checking in large objects into the repo. +@snnn @pranavsharma FYI

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.