neuralmagic / sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Apache License 2.0
371 stars 25 forks source link

[Fix] Avoid extracting tarball multiple times #377

Closed dbogunowicz closed 1 year ago

dbogunowicz commented 1 year ago

Fix description

As reported by @dbarbuzzi, when we download stub files, we are doing additional, unnecessary downloads: https://app.asana.com/0/1203126676641557/1205743889252861/f

This fix removes additional unpacking of the onnx tarball. Tarball is being extracted earlier for the first time after it has been downloaded, no need to do it all over again after all the files have been downloaded.

Manual testing

from sparsezoo import Model

model = Model("zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none", download_path = "test_path").path
# download training
Downloading (…)lidation-metric.yaml: 100%|██████████| 158/158 [00:00<00:00, 84.0kB/s]
Downloading (…)ng/pytorch_model.bin: 100%|██████████| 415M/415M [00:37<00:00, 11.8MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 134/134 [00:00<00:00, 65.1kB/s]
Downloading (…)est_predictions.json: 100%|██████████| 43.8M/43.8M [00:03<00:00, 12.6MB/s]
Downloading (…)ng/eval_results.json: 100%|██████████| 95.0/95.0 [00:00<00:00, 56.7kB/s]
Downloading (…)val_predictions.json: 100%|██████████| 582k/582k [00:00<00:00, 7.55MB/s]
Downloading (…)training/config.json: 100%|██████████| 619/619 [00:00<00:00, 341kB/s]
Downloading (…)h/training/vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 4.70MB/s]
Downloading (…)ining/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 9.89MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 146kB/s]
Downloading (…)ng/training_args.bin: 100%|██████████| 2.48k/2.48k [00:00<00:00, 1.66MB/s]
Downloading (…)g/train_results.json: 100%|██████████| 157/157 [00:00<00:00, 78.7kB/s]
Downloading (…)ing/all_results.json: 100%|██████████| 250/250 [00:00<00:00, 93.8kB/s]
Downloading (…)g/trainer_state.json: 100%|██████████| 3.39k/3.39k [00:00<00:00, 1.69MB/s]
# download deployment
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:37<00:00, 11.7MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 174kB/s]
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 10.3MB/s]
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 258kB/s]
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.8MB/s]
# download other files
Downloading (…)sample-inputs.tar.gz: 100%|██████████| 2.76k/2.76k [00:00<00:00, 1.49MB/s]
Downloading (…)ample-outputs.tar.gz: 100%|██████████| 52.8k/52.8k [00:00<00:00, 13.6MB/s]
Downloading (…)test_path/model.md: 100%|██████████| 1.45k/1.45k [00:00<00:00, 1.27MB/s]
Downloading (…)th/model.onnx.tar.gz: 100%|██████████| 385M/385M [00:32<00:00, 12.4MB/s]
Process finished with exit code 0

Contents of test_path

test_path/
├── deployment
│   ├── config.json
│   ├── model.onnx
│   ├── tokenizer_config.json
│   └── tokenizer.json
├── model.md
├── model.onnx
├── model.onnx.tar.gz
├── sample-inputs.tar.gz
├── sample-outputs.tar.gz
└── training
    ├── all_results.json
    ├── config.json
    ├── eval_nbest_predictions.json
    ├── eval_predictions.json
    ├── eval_results.json
    ├── pytorch_model.bin
    ├── special_tokens_map.json
    ├── squad-validation-metric.yaml
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── trainer_state.json
    ├── training_args.bin
    ├── train_results.json
    └── vocab.txt

Before that diff, we have been doing heaps of downloads because we were overwriting the existing files (self._path is not None -> download):

Downloading (…)lidation-metric.yaml: 100%|██████████| 158/158 [00:00<00:00, 95.0kB/s]
Downloading (…)ng/pytorch_model.bin: 100%|██████████| 415M/415M [00:37<00:00, 11.7MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 134/134 [00:00<00:00, 65.8kB/s]
Downloading (…)est_predictions.json: 100%|██████████| 43.8M/43.8M [00:03<00:00, 12.3MB/s]
Downloading (…)ng/eval_results.json: 100%|██████████| 95.0/95.0 [00:00<00:00, 77.3kB/s]
Downloading (…)val_predictions.json: 100%|██████████| 582k/582k [00:00<00:00, 7.52MB/s]
Downloading (…)training/config.json: 100%|██████████| 619/619 [00:00<00:00, 244kB/s]
Downloading (…)h/training/vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 4.23MB/s]
Downloading (…)ining/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 9.08MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 133kB/s]
Downloading (…)ng/training_args.bin: 100%|██████████| 2.48k/2.48k [00:00<00:00, 681kB/s]
Downloading (…)g/train_results.json: 100%|██████████| 157/157 [00:00<00:00, 51.7kB/s]
Downloading (…)ing/all_results.json: 100%|██████████| 250/250 [00:00<00:00, 86.9kB/s]
Downloading (…)g/trainer_state.json: 100%|██████████| 3.39k/3.39k [00:00<00:00, 1.97MB/s]
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.9MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 132kB/s]
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 8.50MB/s]
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 183kB/s]
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 12.0MB/s]
Downloading (…)sample-inputs.tar.gz: 100%|██████████| 2.76k/2.76k [00:00<00:00, 935kB/s]
Downloading (…)ample-outputs.tar.gz: 100%|██████████| 52.8k/52.8k [00:00<00:00, 4.20MB/s]
Downloading (…)test_path/model.md: 100%|██████████| 1.45k/1.45k [00:00<00:00, 460kB/s]
Downloading (…)th/model.onnx.tar.gz: 100%|██████████| 385M/385M [00:32<00:00, 12.3MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:37<00:00, 11.7MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 157kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 10.1MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 462kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:35<00:00, 12.1MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:39<00:00, 10.9MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 182kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 14.0MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 249kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:40<00:00, 10.7MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.9MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 200kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 11.3MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 338kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:35<00:00, 12.1MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/model.onnx with the new location: test_path/deployment/model.onnx.
Downloading (…)eployment/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 11.8MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer_config.json with the new location: test_path/deployment/tokenizer_config.json.
Downloading (…)okenizer_config.json: 100%|██████████| 331/331 [00:00<00:00, 168kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/tokenizer.json with the new location: test_path/deployment/tokenizer.json.
Downloading (…)yment/tokenizer.json: 100%|██████████| 874k/874k [00:00<00:00, 10.8MB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/deployment/config.json with the new location: test_path/deployment/config.json.
Downloading (…)ployment/config.json: 100%|██████████| 619/619 [00:00<00:00, 193kB/s]
Overwriting the current location of the File: test_path/deployment.tar.gz/model.onnx with the new location: test_path/model.onnx.
Downloading (…)test_path/model.onnx: 100%|██████████| 415M/415M [00:36<00:00, 12.0MB/s]