nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
428 stars 85 forks source link

Implement simpler API signatures #167

Open betolink opened 2 years ago

betolink commented 2 years ago

The concept of collections vs granules and instantiating those classes can be confusing, especially for new users. Perhaps having a static method that can simplify things would be simpler to use e.g. (using the upcoming name)

import earthaccess
import xarray as xr

auth = earthaccess.login(strategy="netrc")

granules = earthaccess.get_granules(short_name="ATL06",
                                    cloud_hosted=True,
                                    polygon=((1,2),(3,4)... ),
                                    temporal=("2020-01-01","2020-12-31"))

ds = xr.open_mfdataset(earthaccess.open(granules, auth=auth))

This would be better suited for regional use cases since we'll be downloading the metadata from CMR in one go. Workflows that may require bulk downloads could potentially use an iterator like

import earthaccess

auth = earthaccess.login(strategy="netrc")
query = earthaccess.search(short_name="ATL06",
                           cloud_hosted=True,
                           page_size=2000,
                           polygon=((1,2),(3,4)... ),
                           temporal=("2020-01-01","2020-12-31"))

for granules in query.items():
    # here granules is a resultset of up to 2000 granules.
    earthaccess.download(granules, "./data/ATL06/", auth=auth)
andypbarrett commented 2 years ago

I like this approach. However, this methods requires knowing the concept id or short name first. This is fine if you know your dataset or know to go to EarthData Search but we need to explain that step. Otherwise getting the shortname is "magic".

betolink commented 2 years ago

I totally agree @andypbarrett, we need to explain where this short_name is coming from and what's more, expose the other ways of finding data (search by DOI for example). These new methods would also abstract the fact that at granule level there is no way of querying using the cloud_hosted flag. New users unfamiliar with CMR would need to know the provider to make that distinction, i.e. NSIDC_ECS vs NSIDC_CPRD

scottyhq commented 1 year ago

I was wondering if the current search keyword arguments are based on the earthdata search API? Perhaps it'd be possible to use STAC-API standards for more consistency across libraries (at least for the main spatiotemporal ones?).

I know it's nitpicky and annoying to have breaking changes like this, but the user experience is nice coming from other libraries like pystac_client / nasa-cmr-stac. and perhaps the standard names and acceptable formats could have the benefit of re-using already implemented parsers in those other libraries.

https://github.com/radiantearth/stac-api-spec/tree/main/item-search#query-parameter-table e.g.:

temporal -> datetime
polygon -> intersects
collections -> short_name?
betolink commented 1 year ago

This is a great idea @scottyhq !! they don't have to be breaking changes, we can just add them as aliases to the class methods and will work the same way (without breaking the old names), the only thing that might need some work is processing the GeoGSON geometry for intersects. I definitely see value in using a standard. Down the road we can even use pystac under the hood and make this library more generic and not entirely tied to CMR.

github-actions[bot] commented 1 month ago

Closing after 10 days of waiting for feedback. If you feel this was in error, please re-open, @ a maintainer, or create new issues.

asteiker commented 1 month ago

@betolink Is this still relevant with more recent earthaccess releases? What does this proposal achieve compared to what we currently support?