ratt-ru / packratt

BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

GoogleDrive folder permissions break cache check process #18

Open kwazzi-jack opened 3 years ago

kwazzi-jack commented 3 years ago

To reproduce the problem, I have a private google drive folder setup with a single MS called KAT7_200_7_1.tar.gz. The yaml entry in registry.yaml is as follows:

'/TEST/KAT7_200_7_1.tar.gz':
    'type': 'google'
    'file_id': '1wtAoYX157fnNYQaSwZJbppgQcbp8ttw2'
    'hash': 'a1abca2896a3682485193eb72f9174a03ec88815c617a1cef8ceb907117e44f8'
    'description': 'KAT7 measurement set with n_time=200, n_ant=7 and n_chan=1.'

Where the file_id is from the url of the MS share link, and the hash is calculated in my terminal using as suggested:

~$ sha256sum KAT7_200_7_1.tar.gz 
a1abca2896a3682485193eb72f9174a03ec88815c617a1cef8ceb907117e44f8

If I run the command-line packratt for this measurement set, the following ValueError is raised:

~$ packratt get /TEST/KAT7_200_7_1.tar.gz
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): drive.google.com:443
DEBUG:urllib3.connectionpool:https://drive.google.com:443 "GET /uc?export=download&id=1wtAoYX157fnNYQaSwZJbppgQcbp8ttw2 HTTP/1.1" 302 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): accounts.google.com:443
DEBUG:urllib3.connectionpool:https://accounts.google.com:443 "GET /ServiceLogin?service=wise&passive=1209600&continue=https://drive.google.com/uc?export%3Ddownload%26id%3D1wtAoYX157fnNYQaSwZJbppgQcbp8ttw2&followup=https://drive.google.com/uc?export%3Ddownload%26id%3D1wtAoYX157fnNYQaSwZJbppgQcbp8ttw2&ltmpl=drive HTTP/1.1" 200 None
Traceback (most recent call last):
  File "/home/brian/Code/brianWelman/final/bin/packratt", line 11, in <module>
    sys.exit(run())
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/application.py", line 46, in run
    return _run(sys.argv[1:])
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/application.py", line 38, in _run
    commands(args.command, args)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/dispatch.py", line 19, in __call__
    return fn(*args, **kwargs)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/application.py", line 18, in get
    return get(args.key, args.destination)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/interface.py", line 62, in get
    % (sha256_hash, entry['hash']))
ValueError: sha256hash does not agree. 42ddaa7f99fef90ebdda747b497dc7a912ec19d030700e25f727b0300a4decc8 vs a1abca2896a3682485193eb72f9174a03ec88815c617a1cef8ceb907117e44f8

where it seems restricted access folders/files produce a different sha256 hash:

42ddaa7f99fef90ebdda747b497dc7a912ec19d030700e25f727b0300a4decc8

but can be fixed by setting the folder and its contents to public. The issue comes with trying to run the same packratt command after fixing the drive permissions.

~$ packratt get /TEST/KAT7_200_7_1.tar.gz
Traceback (most recent call last):
  File "/usr/lib/python3.6/shutil.py", line 916, in _unpack_tarfile
    tarobj = tarfile.open(filename)
  File "/usr/lib/python3.6/tarfile.py", line 1578, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/brian/Code/brianWelman/final/bin/packratt", line 11, in <module>
    sys.exit(run())
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/application.py", line 46, in run
    return _run(sys.argv[1:])
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/application.py", line 38, in _run
    commands(args.command, args)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/dispatch.py", line 19, in __call__
    return fn(*args, **kwargs)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/application.py", line 18, in get
    return get(args.key, args.destination)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/interface.py", line 64, in get
    shutil.unpack_archive(str(filename), destination)
  File "/usr/lib/python3.6/shutil.py", line 983, in unpack_archive
    func(filename, extract_dir, **kwargs)
  File "/usr/lib/python3.6/shutil.py", line 919, in _unpack_tarfile
    "%s is not a compressed or uncompressed tar file" % filename)
shutil.ReadError: /home/brian/.cache/packratt/f9/7f/9f/ce709847c2bf044ae813adede0002f26ee98cfd9de6ca70c11/KAT7_200_7_1.tar.gz is not a compressed or uncompressed tar file

The only way around this is to remove the cached file:

rm /home/brian/.cache/packratt/f9/7f/9f/ce709847c2bf044ae813adede0002f26ee98cfd9de6ca70c11/KAT7_200_7_1.tar.gz

Then packratt get command is back to normal. Looking at the get function in interface.py, my assumption is that the problem occurs in lines 63 to 67:

def get(key, destination, entry=None):
.
.
.
# Download to the destination
        sha256_hash = downloaders(entry['type'], key, entry)

        if not sha256_hash == entry['hash']:
            raise ValueError("sha256hash does not agree. %s vs %s"
                             % (sha256_hash, entry['hash']))

The file is downloaded and stored in .cache with the incorrect hash since the drive permissions changed the resulting hash value. Now even if I change permissions on google drive, packratt always checks for the file in .cache and will always throw the ValueError until the cached version is removed to correct the hash-key.

Side note: Is it intended functionality to still keep the downloaded files in .cache even if the hash check fails?

sjperkins commented 3 years ago

Thanks for the in-depth investigation @brianwelman2. Would you consider submitting a PR to fix?

sjperkins commented 3 years ago

I think that the corrupted download should be removed if the hash check fails

kwazzi-jack commented 3 years ago

I'm not certain how the cache python api works though, but I can have a go.

On Wed, 17 Feb 2021, 09:29 Simon Perkins, notifications@github.com wrote:

I think that the corrupted download should be removed if the hash check fails

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ratt-ru/packratt/issues/18#issuecomment-780363921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5OS5BSDG66TKPHM43RKCDS7NV5BANCNFSM4XWPJCAQ .