radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
800 stars 179 forks source link

Use of "items" rel in collection #687

Closed afrank150 closed 4 years ago

afrank150 commented 5 years ago

For STAC users that have very large collections, especially when a collection is being continually added to, it is likely not feasible to add an "item" relationship for each item in the parent collection. The OGC API Features spec proposes the use of an "items" rel that would solve this problem by linking to a GET on .../collections/{collectionId}/items. Here is a link to where I think that is called out in OAFeat. API users can then use this to page through all items in a collection.

While the "items" rel works well for a Dynamic catalog (API) it will not work for the Static catalog construct without supporting a featureCollection file that matches the API response. Supporting this manifest like file is even less tenable than linking to every item in the collection. So assuming the "items" rel will be adopted by the STAC spec, what is the recommendation for its use with with Static catalogs? Are there any other alternative solutions for a pointer to all items in a collection that don't require continually updating a potentially massive file?

cholmes commented 4 years ago

Hey @afrank150 - sorry for the slow response on this.

But I don't think we should go down the route of supporting 'items' in the core STAC catalog. It makes lots of sense to me in the API, as items/ is the dynamic endpoint that you can query, and the collection returned to you is generated on the fly. But I view the API as an extension to the core that allows querying.

For the core I think an items rel would make things more complicated - clients would have to understand both item and items, and take different action for each. And if the items document is an item collection then there'd be no clear URL to refer to a particular item, unless we used something like https://goessner.net/articles/JsonPath/ - but it's not widely accepted afaik. A key to me with STAC is every record is addressable, like https://spacenet-stac.s3.amazonaws.com/spacenet-dataset/AOI_2_Vegas.json - and then you can have an html page that corresponds.

I'm curious to hear more about the use case where you can't add item links. I think one key to me is that you should split your catalog into logical 'chunks' - so you don't have one parent file with a million item links, that updates each time. Instead you'd have a hierarchy, like sensor/country/state/year/month/day. A more naive one could just do sensor and date, and then into time. Then each hour you can make a new 'catalog', that has all the data collected in that hour (with links to all the child item records), and you just have to update the 'date' catalog with one new 'hour' entry. You could also play with something semi-dynamic - like store the item json on s3, but implement the catalog links with an API.

That said, I have been convinced that there is value in an overall index of the holdings, like how landsat has a single file csv that has all the data in one file, but its just the core information. I'd see it as an optional extension that complements the core - really it provides an index in to the full json records. No one has created that yet, but I think that's a promising direction.

One more point - I would also like to see the OGC have an equivalent of our static catalog, so if they do come up with something like that then we would align. But I'd still see it as an 'option', something to add on top of the core STAC spec, and in the core we should specify just one way to do things.

cholmes commented 4 years ago

I'm going to close this, as I don't think we are going to offer an items rel in the core stac spec. Perhaps it would be good to have some best practices for extremely large catalogs, but I think it's just dividing your sub-catalogs up in a way that they can be updated iteratively.