nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
388 stars 75 forks source link

Unclear output when errors occur during download operation #481

Open Rapsodia86 opened 6 months ago

Rapsodia86 commented 6 months ago

Hello again, I wanted to download ECOSTRESS LST data, but I have been getting: requests.exceptions.HTTPError: 502 Server Error: Bad Gateway halfway through downloading a file.

Here is a snippet:

granules = earthaccess.search_data(
 short_name="ECO_L2T_LSTE",
 temporal = ("2023-01-01", "2024-01-01"),
 bounding_box =(-85.40565331930281,42.39047390075025,-85.36185372260847,42.42213872253686),
 count=-1
)
downloaded_files = earthaccess.download(
    granules,
    local_path='N:/MYDIR/',
)

And just a part of the output. I do not have any problems with those specific files when downloading directly from https://search.earthdata.nasa.gov/

Getting 454 granules, approx download size: 3.04 GB QUEUEING TASKS | : 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3632/3632 [00:00<00:00, 66708.32it/s] PROCESSING TASKS | : 31%|████████████████████████████████████████████████████████████████████████ | 1119/3632 [07:28<37:24, 1.12it/s]Error while downloading the file ECOv002_L2T_LSTE_27122_009_16TFM_20230418T182710_0710_01_height.tif Traceback (most recent call last): File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\earthaccess\store.py", line 607, in _download_file r.raise_for_status() File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_27122_009_16TFM_20230418T182710_0710_01/ECOv002_L2T_LSTE_27122_009_16TFM_20230418T182710_0710_01_height.tif

PROCESSING TASKS | : 31%|████████████████████████████████████████████████████████████████████████▏ | 1120/3632 [07:28<30:44, 1.36it/sE rror while downloading the file ECOv002_L2T_LSTE_27122_009_16TFM_20230418T182710_0710_01_LST.tif Traceback (most recent call last): File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\earthaccess\store.py", line 607, in _download_file r.raise_for_status() File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_27122_009_16TFM_20230418T182710_0710_01/ECOv002_L2T_LSTE_27122_009_16TFM_20230418T182710_0710_01_LST.tif

PROCESSING TASKS | : 31%|████████████████████████████████████████████████████████████████████████▎ | 1122/3632 [07:28<20:18, 2.06it/sE rror while downloading the file ECOv002_L2T_LSTE_27137_010_16TFN_20230419T173850_0710_01_water.tif Traceback (most recent call last): File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\earthaccess\store.py", line 607, in _download_file r.raise_for_status() File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_27137_010_16TFN_20230419T173850_0710_01/ECOv002_L2T_LSTE_27137_010_16TFN_20230419T173850_0710_01_water.tif

PROCESSING TASKS | : 31%|████████████████████████████████████████████████████████████████████████▎ | 1123/3632 [07:29<18:13, 2.29it/sE rror while downloading the file ECOv002_L2T_LSTE_27137_010_16TFN_20230419T173850_0710_01_cloud.tif Traceback (most recent call last): File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\earthaccess\store.py", line 607, in _download_file r.raise_for_status() File "C:\Users\monikat\AppData\Local\miniconda3\envs\earthaccess\lib\site-packages\requests\models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_27137_010_16TFN_20230419T173850_0710_01/ECOv002_L2T_LSTE_27137_010_16TFN_20230419T173850_0710_01_cloud.tif


And then, when the download is finished, it shows like all files have been downloaded correctly: PROCESSING TASKS | : 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3632/3632 [25:04<00:00, 2.41it/s] COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3632/3632 [00:00<00:00, 1208832.89it/s]

But in the download folder, I do have 3533 files.

mfisher87 commented 6 months ago

Thanks for the report! That's definitely not a good user experience.

I think the thing for us to focus on here is improving the messaging, since we can't control the stability of the server(s) we're downloading from. How does the user know what it means when their download reaches 100% with errors in the log? Were those files retried and successfully downloaded? Or did the job complete with errors, never downloading some files? In this case, it was the latter, but @Rapsodia86 had to do their own investigation to learn that.

Some sort of summary message at the end would be really valuable. It should list out the URLs that failed so the user can investigate with the provider. E.g. f"Failed to download the following granule URLs within {retry_count} attempts.\n{urls}\n\nPlease contact the data provider ({provider_support_email}) to report errors or instability.". Can we get the provider support email out of CMR metadata? Alternately, perhaps the call to earthaccess.download() should raise an error in this case. @Rapsodia86 what would be your preference as a user?

Rapsodia86 commented 6 months ago

Hi @mfisher87, thanks for taking care of this. When a requests.exceptions occurs, how many retires/attempts are there? That summary would be helpful! Also, maybe a log file with a list of failed urls? I know that if I rerun the earthaccess.download(), the files that exist will be skipped (Btw. is the filename only checked or the file size as well?). However, that gives an option to instead of running it again, I would just upload the log file and run the earthaccess.download() on that file. What do you think about it? Too much?

betolink commented 6 months ago

We currently have no retry attempts, the only "smart" thing earthaccess does is that if a granule already exist in the target path it won't try to download it again. We should implement a more robust mechanism to keep track of errors and retries. I like that behavior @mfisher87!

mfisher87 commented 6 months ago

Maybe we can consider these separate features? E.g.: