neuralmagic / sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Apache License 2.0
371 stars 25 forks source link

[BugFix] Path not expanded #418

Closed rahul-tuli closed 10 months ago

rahul-tuli commented 10 months ago

Deployment tar not found bug

When downloading the model, if the python API is used and a download_path is specified, such that the download path has the home directory ~ in it; it leads to file not found error when unzipping tar files.

python local/scripts/deployment_dir_bug.py --small-model 
Downloading (…)training/config.json: 100%|██████████████████████████████████████| 0.98k/0.98k [00:00<00:00, 377kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████| 240/240 [00:00<00:00, 95.5kB/s]
Downloading (…)/training/merges.txt: 100%|███████████████████████████████████████| 446k/446k [00:00<00:00, 8.83MB/s]
Downloading (…)g/model_nocache.onnx: 100%|███████████████████████████████████████| 496M/496M [00:43<00:00, 12.0MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████| 90.0/90.0 [00:00<00:00, 18.9kB/s]
Downloading (…)/training/vocab.json: 100%|███████████████████████████████████████| 779k/779k [00:00<00:00, 10.6MB/s]
Downloading (…)ining/tokenizer.json: 100%|█████████████████████████████████████| 2.02M/2.02M [00:00<00:00, 10.7MB/s]
Downloading (…)el/deployment.tar.gz: 100%|███████████████████████████████████████| 265M/265M [00:23<00:00, 12.0MB/s]
[Errno 2] No such file or directory: '~/test-models/small-model/deployment.tar.gz'
Traceback (most recent call last):
  File "/home/rahul/projects/sparsezoo/src/sparsezoo/objects/directory.py", line 190, in download
    target_directory.unzip()
  File "/home/rahul/projects/sparsezoo/src/sparsezoo/objects/directory.py", line 306, in unzip
    tar = tarfile.open(self._path, "r")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tarfile.py", line 1804, in open
    return func(name, "r", fileobj, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tarfile.py", line 1870, in gzopen
    fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '~/test-models/small-model/deployment.tar.gz'

Trying attempt 1 of 1.
Download retry failed...

Issue

The issue is that the ~ is not expanded to the home directory when the download path is specified. This is a bug in the sparsezoo python API.

Test Script

# deployment_dir_bug.py

import argparse
from sparsezoo import Model

def parse_args():
    parser = argparse.ArgumentParser(description="Test Download Bug")
    parser = argparse.ArgumentParser(description='Download models.')
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('--big-model', action='store_true', help='Download big model')
    group.add_argument('--small-model', action='store_true', help='Download small model')
    parser.add_argument('--download-path', type=str, required=False, help='Path to download the model', default=None)
    return parser.parse_args()

def main():
    args = parse_args()
    if args.big_model:
        stub = "zoo:llama2-7b-ultrachat200k_llama2_pretrain-pruned80"
        potential_download_path = "~/test-models/big-model"
    else:
        stub = "zoo:codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized"
        potential_download_path = "~/test-models/small-model"

    download_path = args.download_path if args.download_path else potential_download_path
    sparsezoo_model = Model(stub, download_path=download_path)
    downloaded_path = sparsezoo_model.download()
    print(f"Downloaded Model contents to {downloaded_path=}")
    print(f"Sparsezoo Model: {sparsezoo_model=}")

if __name__ == "__main__":
    main()

Steps to Reproduce

Invoke the script with the --small-model flag, we should see the error.

After this PR

The issue should be resolved and the deployment tar should be found.

python local/scripts/deployment_dir_bug.py --small-model 
Downloading (…)training/config.json: 100%|██████████████████████████████████████| 0.98k/0.98k [00:00<00:00, 382kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████| 240/240 [00:00<00:00, 73.7kB/s]
Downloading (…)/training/merges.txt: 100%|███████████████████████████████████████| 446k/446k [00:00<00:00, 7.24MB/s]
Downloading (…)g/model_nocache.onnx: 100%|███████████████████████████████████████| 496M/496M [00:44<00:00, 11.6MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████| 90.0/90.0 [00:00<00:00, 21.1kB/s]
Downloading (…)/training/vocab.json: 100%|███████████████████████████████████████| 779k/779k [00:00<00:00, 9.05MB/s]
Downloading (…)ining/tokenizer.json: 100%|█████████████████████████████████████| 2.02M/2.02M [00:00<00:00, 10.7MB/s]
Downloading (…)el/deployment.tar.gz: 100%|███████████████████████████████████████| 265M/265M [00:23<00:00, 12.1MB/s]
Downloading (…)small-model/model.md: 100%|██████████████████████████████████████| 0.99k/0.99k [00:00<00:00, 218kB/s]
Downloading (…)el/model.onnx.tar.gz: 100%|███████████████████████████████████████| 264M/264M [00:23<00:00, 11.7MB/s]
Downloaded Model contents to downloaded_path=False
Sparsezoo Model: sparsezoo_model=Model(stub=zoo:codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized)

rahul at office-desktop in ~/projects/sparsezoo (.base_venv) 
$ tree ~/test-models                                                                             (release/1.7|✚1…1)
/home/rahul/test-models
└── small-model
    ├── deployment
    │   ├── config.json
    │   ├── merges.txt
    │   ├── model.onnx
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   ├── tokenizer.json
    │   └── vocab.json
    ├── deployment.tar.gz
    ├── model.md
    ├── model.onnx
    ├── model.onnx.tar.gz
    └── training
        ├── config.json
        ├── merges.txt
        ├── model_nocache.onnx
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        ├── tokenizer.json
        └── vocab.json

3 directories, 18 files