nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
410 stars 81 forks source link

List available Harmony services for a dataset #447

Closed nikki-t closed 1 month ago

nikki-t commented 8 months ago

As a first step to facilitating the use of services in earthaccess, we should modify earthaccess so that it can list the available services for a collection.

Link to Harmony Documentation: https://harmony.earthdata.nasa.gov/docs Link to CMR API documentation on services: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service

This would allow earthaccess to return a list of services for a collection so that we can integrate future work on service usage into the codebase. Related issue: https://github.com/nsidc/earthaccess/issues/328

asteiker commented 8 months ago

This is a (painful) way of determining available services for a given collection using graphql: https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/07_Harmony_Subsetting.html#discover-service-options-for-a-given-data-set

Harmony also provides a capabilities endpoint to determine available services: https://harmony.earthdata.nasa.gov/docs#available-services

JessicaS11 commented 8 months ago

I was looking through python_cmr for something else and came across a set of functions for "Tool and Variable Service CMR Queries" starting on this line. Not sure if they'll help but figured it's worth a check to see if they've already done some of the work for us...

andypbarrett commented 7 months ago

I wonder if we should be thinking about how this kind of query might be used and by what/whom?

My current thinking is... If a python tool is returning information about services then that information should be able to be used by a tool (the same tool or a different one). I'm thinking of a pipeline...

result = earthaccess.search_datasets(...)
if "harmony" in result.services:
    subset = result.harmony.subsetter().to_file(name_of_file)  # uses spatial and temporal bounds from query for subsetting
else:
    earthaccess.download(result)

I can also see a case for a user querying services from a notebook: for example you have found the dataset you want and you want to know if you have to download/access a complete file or is you can use a service.

I think beyond that, a lot of discovery for services and options would be done via user guides and other web-hosted information.

JessicaS11 commented 7 months ago

I think beyond that, a lot of discovery for services and options would be done via user guides and other web-hosted information.

This was part of our hope via a plugin interface (#328). Sort of like Xarray can discover and use whatever backends you have installed in your environment, earthaccess can discover and use whatever services/subsetters are available via your installed libraries, so long as those libraries have set up the required plugin functionality. This takes the onus off earthaccess to actually implement/maintain specific interfaces (except, perhaps, with a system like harmony) but makes it easy for users to access those other tools through earthaccess in a predictable way.

nikki-t commented 7 months ago

@andypbarrett and @JessicaS11 - I think you both bring up some great points around the use of services so I put together a mini roadmap for implementing service information with an eye towards implementing a plugin interface.

Requirements analysis: How would a user approach searching for a service?

  1. Would a user want to pull the available services from the earthaccess.results.DataCollection object?
    • A user would search a collection for a specific service and then decide to use the service to subset data passing the required input data for the service to perform its operations on the data and return results.
  2. Would a user want to pull the available services from the earthaccess.results.DataGranule object?
    • A user would search for granules and then decide to use a service to subset them passing the granule data to the service.
  3. Would a user to be able to search for all services?
    • A user can search for a service by name and return data about that service. What data would that user be interested in?

Proposed code design focusing on bullet 1

  1. Create a earthaccess.results.DataCollection.service method that returns the services associated with a collection.
    • Retrieve service concept ID from the results of the CMR query. Requires modification to the earthaccess.results.DataCollection class to include service results.
    • Search for each service using a CMR query by concept ID to retrieve the name of the service and possibly other information that may be useful to know about the service.
    • Return results of service query.
  2. Create a plugin interface. Work belongs to Issue #328.
    • Create a plugin directory that holds a Plugin abstract class to serve as the parent for child classes that implement the use of various collection services like Harmony, OPeNDAP, HyP3, etc.
    • earthaccess can automatically discover and load plugins found in this directory. Here is a method that may prove useful.
    • This way anyone can add a plug-in and earthaccess can use it by working with the Plugin abstract class methods.
  3. Integrate the service results with a plugin interface.
    • The earthaccess.results.DataCollection.service method can be modified to search the names of available plugins.
    • The method can also check a list of plugins that do not have UMM-S records but are associated with collections.
    • The method can then return the plugins available for the specific dataset.

Nice to have or future work (based on user need)