podaac / data-subscriber

Subscribe and bulk download collections of data at PO.DAAC
Apache License 2.0
83 stars 29 forks source link

sha-512 checksum not supported #82

Closed mike-gangl closed 2 years ago

mike-gangl commented 2 years ago

when running the downloader/subscriber with a a collection supporting sha-512, we run into errors:

podaac-data-downloader --verbose -c GRACEFO_L2_CSR_MONTHLY_0060 -d ./podaac_csr -sd 2018-01-01T00:00:00Z -ed 2022-06-14T16:11:58Z -e "00"
[2022-06-13 11:02:44,494] {podaac_data_downloader.py:158} INFO - NOTE: Making new data directory at ./podaac_csr(This is the first run.)
[2022-06-13 11:02:44,699] {podaac_data_downloader.py:192} INFO - Temporal Range: 2018-01-01T00:00:00Z,2022-06-14T16:11:58Z
[2022-06-13 11:02:44,699] {podaac_data_downloader.py:195} INFO - Provider: POCLOUD
[2022-06-13 11:02:44,700] {podaac_access.py:300} INFO - https://cmr.earthdata.nasa.gov/search/granules.umm_json?page_size=2000&sort_key=-start_date&provider=POCLOUD&ShortName=GRACEFO_L2_CSR_MONTHLY_0060&temporal=2018-01-01T00%3A00%3A00Z%2C2022-06-14T16%3A11%3A58Z&token=5896F157-242C-A41B-F04D-45D86713C6ED&bounding_box=-180%2C-90%2C180%2C90
[2022-06-13 11:02:46,826] {podaac_data_downloader.py:209} INFO - 176 granules found for GRACEFO_L2_CSR_MONTHLY_0060
[2022-06-13 11:02:46,827] {podaac_data_downloader.py:249} INFO - Found 176 total files to download
[2022-06-13 11:02:46,827] {podaac_data_downloader.py:251} INFO - Downloading files with extensions: ['00']
[2022-06-13 11:02:52,528] {podaac_data_downloader.py:278} INFO - 2022-06-13 11:02:52.528421 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GAD-2_2022060-2022090_GRFO_UTCSR_BC01_0600
[2022-06-13 11:02:54,344] {podaac_data_downloader.py:278} INFO - 2022-06-13 11:02:54.344359 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GSM-2_2022060-2022090_GRFO_UTCSR_BB01_0600
[2022-06-13 11:02:56,177] {podaac_data_downloader.py:278} INFO - 2022-06-13 11:02:56.177678 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GSM-2_2022060-2022090_GRFO_UTCSR_BA01_0600
[2022-06-13 11:02:58,036] {podaac_data_downloader.py:278} INFO - 2022-06-13 11:02:58.036421 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GAC-2_2022060-2022090_GRFO_UTCSR_BC01_0600
[2022-06-13 11:03:00,256] {podaac_data_downloader.py:278} INFO - 2022-06-13 11:03:00.256610 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GSM-2_2022032-2022059_GRFO_UTCSR_BB01_0600
[2022-06-13 11:03:02,585] {podaac_data_downloader.py:278} INFO - 2022-06-13 11:03:02.585296 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GAD-2_2022032-2022059_GRFO_UTCSR_BC01_0600

running the same command again, we get an error:

WARNING - 2022-06-13 11:03:32.626580 FAILURE: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/GRACEFO_L2_CSR_MONTHLY_0060/GAD-2_2022060-2022090_GRFO_UTCSR_BC01_0600
Traceback (most recent call last):
  File "/Users/gangl/miniconda3/lib/python3.8/site-packages/subscriber/podaac_data_downloader.py", line 271, in run
    if(exists(output_path) and not args.force and pa.checksum_does_match(output_path, checksums)):
  File "/Users/gangl/miniconda3/lib/python3.8/site-packages/subscriber/podaac_access.py", line 418, in checksum_does_match
    computed_checksum = make_checksum(file_path, checksum["Algorithm"])
  File "/Users/gangl/miniconda3/lib/python3.8/site-packages/subscriber/podaac_access.py", line 431, in make_checksum
    hash_alg = getattr(hashlib, algorithm.lower())()
AttributeError: module 'hashlib' has no attribute 'sha-512'
mike-gangl commented 2 years ago

umm-g allows the following elements: https://github.com/nasa/Common-Metadata-Repository/blob/master/umm-spec-lib/resources/json-schemas/granule/umm/v1.6.4/umm-g-json-schema.json#L1173

 "enum": ["Adler-32", "BSD checksum", "Fletcher-32", "Fletcher-64", "MD5", "POSIX", "SHA-1", "SHA-2", "SHA-256", "SHA-384", "SHA-512", "SM3", "SYSV"]

and the hashlib seems to support the following:

>>> import hashlib
>>> print(dir(hashlib))
['__all__', '__block_openssl_constructor', '__builtin_constructor_cache', '__builtins__', '__cached__', '__doc__', '__file__', '__get_builtin_constructor', '__loader__', '__name__', '__package__', '__spec__', '_hashlib', 'algorithms_available', 'algorithms_guaranteed', 'blake2b', 'blake2s', 'md5', 'new', 'pbkdf2_hmac', 'scrypt', 'sha1', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512', 'shake_128', 'shake_256']
>>>