podaac / data-subscriber

Subscribe and bulk download collections of data at PO.DAAC
Apache License 2.0
83 stars 29 forks source link

add retry to 503 error in downloads #97

Open mike-gangl opened 2 years ago

mike-gangl commented 2 years ago

saw this during regression testing:

WARNING  root:podaac_data_subscriber.py:307 2022-08-04 14:20:03.485885 FAILURE: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/JASON_CS_S6A_L2_ALT_LR_STD_OST_NRT_F/S6A_P4_2__LR_STD__NR_042_083_20220101T104242_20220101T123506_F04.nc
Traceback (most recent call last):
  File "/Users/runner/work/data-subscriber/data-subscriber/subscriber/podaac_data_subscriber.py", line 302, in run
    urlretrieve(f, output_path)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 239, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 555, in error
    result = self._call_chain(*args)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 747, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/Users/runner/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable

we should catch the 503 and retry- this could happen for any number of reasons, but we're interested in addressing transient issues that happen occasionally.

some more information can be found here: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/http-503-service-unavailable.html