nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
415 stars 82 forks source link

`open()` is not handling multi-file granules properly #393

Closed betolink closed 11 months ago

betolink commented 11 months ago

I think when we started using the EarthAccesFile wrapper we stopped handling multi-file granules, this example with data from HLS shows the issue

import earthaccess

auth = earthaccess.login()

granules = earthaccess.search_data(
    short_name="HLSL30",
    count=1
)
# HSL is a multispectral dataset and each granule has many files
print(granules[0].data_links())
# earthaccess is only opening the first link
files = earthaccess.open(granules)
files

There is a workaround by manually collecting the links and then opening them but we are missing that sweet auto-wriring from search to access.

jrbourbeau commented 11 months ago

Thanks @betolink. Let me push up a quick PR real quick for this one to get some thoughts

jrbourbeau commented 11 months ago

@betolink I'm able to reproduce the issue with open() (see https://github.com/nsidc/earthaccess/pull/394 for a fix) but not with download()

In [1]: import earthaccess
   ...:
   ...: auth = earthaccess.login()
   ...:
   ...: granules = earthaccess.search_data(
   ...:     short_name="HLSL30",
   ...:     count=1
   ...: )
   ...: # HSL is a multispectral dataset and each granule has many files
   ...: print(granules[0].data_links())
   ...: # earthaccess is only opening the first link
   ...: files = earthaccess.download(granules, "foo")
   ...: files
Granules found: 11072415
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B09.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.Fmask.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B06.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B01.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B07.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.VZA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B10.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B04.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.SAA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.VAA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B05.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.SZA.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B02.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T59WPT.2013101T001445.v2.0/HLS.L30.T59WPT.2013101T001445.v2.0.B11.tif']
 Getting 1 granules, approx download size: 0.1 GB
QUEUEING TASKS | : 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 1489.56it/s]
PROCESSING TASKS | : 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:08<00:00,  1.68it/s]
COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 48545.19it/s]
Out[1]:
['foo/HLS.L30.T59WPT.2013101T001445.v2.0.B09.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.Fmask.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B06.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B01.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B07.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.VZA.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B10.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B04.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.SAA.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.VAA.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B05.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.SZA.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B02.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B03.tif',
 'foo/HLS.L30.T59WPT.2013101T001445.v2.0.B11.tif']

it looks like the current main branch is working as expected with multi-file granule downloads. Am I missing something?

betolink commented 11 months ago

That was fast @jrbourbeau! yeah for download() I ran into a 401 error, then I restarted my kernel and it worked again. Looking at the PR now.