stactools-packages / modis

stactools package for working with MODIS data
Other
3 stars 2 forks source link

Change `Collection` ids #14

Closed gadomski closed 2 years ago

gadomski commented 2 years ago

Right now the ID format is MODIS/{version}/{product}, e.g. MODIS/006/MCD12Q1. This isn't illegal, but it's not quite "standard" -- it seems like most STAC Collections use hypens and lower-case. So, we should update IDs to be modis-{version}-{product}.

While we're doing the update, do we want to make the product come before version? E.g. modis-{product}-{version}, e.g. modis-MCD12Q1-006?

matthewhanson commented 2 years ago

For NASA's CMR STAC collections are named using "SHORT_NAME_VERSION", which I think in general is a good way to name collections....having a version number in the ID is critical (if there's ever going to be reprocessing). See https://cmr.earthdata.nasa.gov/stac/LPDAAC_ECS/collections/MCD43A4.v006

I agree that slashes is no good in an ID - some API implementations may even break. I think your suggestion is good, although my preference would be like "MCD12Q1-061"....and then put all the collections in a "MODIS" catalog rather than including "modis" in the ID.

gadomski commented 2 years ago

my preference would be like "MCD12Q1-061"

Ok, yeah I think I like having the product before version as well.

and then put all the collections in a "MODIS" catalog rather than including "modis" in the ID

Hm, interesting. From the spec:

And providers should strive as much as possible to make their Collection ids 'globally' unique, prefixing any common information with a unique string. This could be the provider's name if it is a fairly unique name, or their name combined with the domain they operate in.

It seems like if we were creating a "globally" unique ID, we should include the string "MODIS" (or "modis") in the Collection id?

matthewhanson commented 2 years ago

Yeah, I can see wanting that to be "globally" unique....so "modis-MCD43AA4-006 or something. Or MODIS could be capital...I think in STAC best practices it to favor lowercase and dashes in IDs, unless the provider already has an established naming convention (which is why you see "LANDSAT_8" as a platform name in the USGS API)