This fix removes additional unpacking of the onnx tarball. Tarball is being extracted earlier for the first time after it has been downloaded, no need to do it all over again after all the files have been downloaded.
Manual testing
from sparsezoo import Model
model = Model("zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none", download_path = "test_path").path
Before that diff, we have been doing heaps of downloads because we were overwriting the existing files (self._path is not None -> download):
Downloading (…)lidation-metric.yaml: 100%|██████████| 158/158 [00:00<00:00, 95.0kB/s]
Downloading (…)ng/pytorch_model.bin: 100%|██████████| 415M/415M [00:37<00:00, 11.7MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 134/134 [00:00<00:00, 65.8kB/s]
Downloading (…)est_predictions.json: 100%|██████████| 43.8M/43.8M [00:03<00:00, 12.3MB/s]
Downloading (…)ng/eval_results.json: 100%|██████████| 95.0/95.0 [00:00<00:00, 77.3kB/s]
Downloading (…)val_predictions.json: 100%|██████████| 582k/582k [00:00<00:00, 7.52MB/s]
Downloading (…)training/config.json: 100%|██████████| 619/619 [00:00<00:00, 244kB/s]
Downloading (…)h/training/vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 4.23MB/s]
Downloading (…)ining/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 9.08MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 133kB/s]
Downloading (…)ng/training_args.bin: 100%|██████████| 2.48k/2.48k [00:00<00:00, 681kB/s]
Downloading (…)g/train_results.json: 100%|██████████| 157/157 [00:00<00:00, 51.7kB/s]
Downloading (…)ing/all_results.json: 100%|██████████| 250/250 [00:00<00:00, 86.9kB/s]
Downloading (…)g/trainer_state.json: 100%|██████████| 3.39k/3.39k [00:00<00:00, 1.97MB/s]
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.9MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 132kB/s]
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 8.50MB/s]
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 183kB/s]
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 12.0MB/s]
Downloading (…)sample-inputs.tar.gz: 100%|██████████| 2.76k/2.76k [00:00<00:00, 935kB/s]
Downloading (…)ample-outputs.tar.gz: 100%|██████████| 52.8k/52.8k [00:00<00:00, 4.20MB/s]
Downloading (…)test_path/model.md: 100%|██████████| 1.45k/1.45k [00:00<00:00, 460kB/s]
Downloading (…)th/model.onnx.tar.gz: 100%|██████████| 385M/385M [00:32<00:00, 12.3MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:37<00:00, 11.7MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 157kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 10.1MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 462kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:35<00:00, 12.1MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:39<00:00, 10.9MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 182kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 14.0MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 249kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:40<00:00, 10.7MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.9MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 200kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 11.3MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 338kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:35<00:00, 12.1MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.8MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 168kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 10.8MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 193kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 12.0MB/s]
Fix description
As reported by @dbarbuzzi, when we download stub files, we are doing additional, unnecessary downloads: https://app.asana.com/0/1203126676641557/1205743889252861/f
This fix removes additional unpacking of the onnx tarball. Tarball is being extracted earlier for the first time after it has been downloaded, no need to do it all over again after all the files have been downloaded.
Manual testing
Contents of
test_path
Before that diff, we have been doing heaps of downloads because we were overwriting the existing files (
self._path is not None -> download
):