nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
417 stars 84 forks source link

Add new S3 credentials endpoint for Giovanni zarr store #223

Closed asteiker closed 1 week ago

asteiker commented 1 year ago

A new s3 credentials endpoint for the GES DISC's Giovanni Zarr store is now available: https://api.giovanni.earthdata.nasa.gov/s3credentials

We should make sure this is discoverable through earthaccess. Initially, the associated data collection, GPM_3IMERGHH v6 will have a new RelatedURL that points to documentation on how to access the store. So, we won't have any direct programmatic discovery means of going from the collection CMR record to the zarr store S3 URI until further work is done to extend CMR to support this. But it is great progress in the right direction for end-to-end zarr support.

betolink commented 1 year ago

Currently we only have S3 credentials endpoints on a per DAAC basis, which covers the typical use cases(getting data from a DAAC) but with Giovanni, Harmony and other NASA-wide services maybe it's time to break that login into getting credentials for data and services, something like:

import earthaccess

earthaccess.login()

services = earthaccess.search_services("SOME CRITERIA TO FIND GIOVANNI")
giovanni = services[0]
credentials = giovani.get_s3_credentials()

In the meantime we can add Giovani but it would look like it's another DAAC, or we can wait to implement the services discovery methods and have something like the code above, what do you think @asteiker?

asteiker commented 1 year ago

@betolink Great thoughts. What is a bit unique about this case is that Giovanni's zarr store just happens to be the location of the zarr store for this GPM_3IMERGHH collection. So the use case is really collection-based discovery/access, not service-based per se.

For now, a Related_URL was added to the collection metadata pointing to a "Product Usage" link: C1598621093-GES_DISC. . But the Search & Discovery train will be analyzing how to incorporate this better into collection and variable-based discovery. So I'm not sure if/how to solve for this right now with this workaround, or hold off until zarr is better supported within CMR.

betolink commented 1 year ago

Interesting, in theory if the collection belongs to GES_DISC, the S3 credentials for GES_DISC should work. I assume this is not the case because this Giovanni Zarr store is not using the Cumulus machinery to get the data ingested?

I think that after we implement services and variable discovery in earthaccess, we could use the response from CMR (if it contains specific S3 credentials) to override the DAAC-level credentials for the access part. I'd say let's hold off for now until these use cases have a more programmatic access pattern, what do you think?

jrbourbeau commented 1 year ago

I think that after we implement services and variable discovery in earthaccess, we could use the response from CMR (if it contains specific S3 credentials) to override the DAAC-level credentials for the access part.

Just checking in here. @betolink did https://github.com/nsidc/earthaccess/pull/296 fix this issue?

asteiker commented 1 week ago

It looks like this is complete based on #296 changes.