Open philvarner opened 1 year ago
Some text from stac-api-spec:
If sub-catalogs are used, it is recommended that these use the endpoint /catalogs/{catalogId}
to avoid conflicting
with other endpoints from the root.
Endpoint | Media Type | Returns | Description |
---|---|---|---|
/catalogs/{catalogId} |
application/json | Catalog | child Catalog object |
A STAC API is more useful when it presents a complete Catalog
representation of all the data contained in the
API, such that all Item
objects can be reached by traversing child
and item
link relations from
the root. Being able to reach all Items in this way is formalized in the
Browseable conformance class, but any Catalog can be structured for hierarchical traversal.
Implementers who have search as their primary use case should consider also implementing this
alternate view over the data by presenting it as a directed graph of catalogs, where the child
link relations typically
form a tree, and where each catalog can be retrieved with a single request (e.g., each Catalog JSON is small enough that
it does not require pagination).
For example, child links to sub-catalogs may be structured as in this diagram:
graph LR
A[Root] -->|child| B(sentinel-2-l2a)
B --> |child| C(10SDG)
B --> |child| D(10SDH)
B --> |child| E(10SDJ)
B --> |child| BB(...)
C --> |child| F(2018)
C --> |child| G(2019)
C --> |child| CC(...)
D --> |child| H(2018)
D --> |child| DD(...)
E --> |child| I(2018)
E --> |child| EE(...)
F --> |item| J(12.31.0)
F --> |item| K(01.09.0)
F --> |item| L(01.09.1)
F --> |item| FF(...)
STAC API does not define what endpoint or endpoints should returns these catalogs, but approach would be
to return them from an endpoint like /catalogs/{catalogId}
.
While OAFeat requires that all Items must be part of a Collection, this does not mean that the Collection needs to be part of the browseable tree. If they are part of the tree, it is recommended that there only be one Collection in a path through the tree, and that a collection never contain child collections.
These are the two standard ways of structuring a browseable tree of catalogs, the only difference being whether the Collection is used as part of the tree or not:
All items must be part of a Collection, but the Collection itself does not need to be part of the browsable graph.
How you structure your graph of Catalogs can allow you to both group Collections together and create sub-groups of items within a Collection. For example, your collections may be grouped so each represent a data product. This might mean you have a collection for each of Landsat 8 Collection 1, Landsat 8 Surface Reflectance, Sentinel-2 L1C, Sentinel-2 L2A, Sentinel-5P UV Aerosol Index, Sentinel-5P Cloud, MODIS MCD43A4, MODIS MOD11A1, and MODIS MYD11A1. You can also present each of these as a catalog, and create parent catalogs for them that allow you to group together all Landsat, Sentinel, and MODIS catalogs.
Each of these catalog endpoints could in turn be its own STAC API root, allowing an interface where users can
search over arbitrary groups of collections without needing to explicitly know and name every collection in the
search collection
query parameter. These catalogs-of-catalogs can be separated multiple ways, e.g. be
per provider (e.g., Sentinel-2), per domain (e.g., cloud data), or per form of data (electro-optical, LIDAR, SAR).
Going the other direction, collections can be sub-grouped into smaller catalogs. For example, this example groups a catalog of Landsat 8 Collection 1 items by path, row, and date (the path/row system is used by this product for gridding).
If done in a consistent manner, these can also provide "templated" URIs, such that a user could directly request a
specific path, row, and date simply by replacing the values in /catalogs/landsat_8_c1/{path}_{row}_{date}
.
Similarly, a MODIS product using sinusoidal gridding could use paths of the form
/{horizontal_grid}/{vertical_grid}/{date}
. Since only around 300 scenes produced every day for a MODIS product
and there is a 20 year history of production, these could be fit in a graph with path length 3 from the root
Catalog to each leaf Item.
/catalogs/mcd43a4
(~7,000 child
relation links, one to each date)/catalogs/mcd43a4/{date}
(~300 item
relation links to each Item)
/collections/mcd43a4/items/{itemId}
Catalogs can also group related products. For example, here we group together synthetic aperture radar (SAR) products (Sentinel-1 and AfriSAR) and electro-optical (EO) bottom of atmosphere (BOA) products.
The catalogs structure is a directed graph that allows you to provide numerous different Catalog and Collection graphs to reach leaf Items. For example, for a Landsat 8 data product, you may want to allow browsing both by date then path then row, or by path then row then date:
When more than path to an Item is allowed, it is recommended that the final item
link relation reference a
consistent, canonical URL for each item, instead of a URL that is specific to the path of Catalog that was followed
to reach it.
There are many options for how to structure these catalog graphs, so it will take some analysis work to figure out which one or ones best match the structure of your data and the needs of your consumers.
Dear @philvarner I read in detail what you propose but I do not have clear why you propose the following paths
Rather than:
From my point of view the following tree structure should work well. Do you think it has something that clashes with the specs?
Moved from: