radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
775 stars 178 forks source link

Vector data support #1150

Closed McSurf84 closed 1 year ago

McSurf84 commented 3 years ago

Hi everybody, I have a topic to discuss in the next meeting and want to share it before to think about it. Currently STAC is associated with raster data and also the FAQs indicate that vector data should be used directly with WFS. But what speaks against the administration of vector data in STAC? And what needs to be done to enable the administration of all possible spatial data with STAC to become the single source of (data) truth? I look forward to your opinions 👐

cholmes commented 3 years ago

Welcome @McSurf84!

So I wasn't able to find the perfect document describing how this 'should' work, so I'll try a brief write up here, and aim to get to something that better explains this.

So the issue is really that putting vector data on an 'item' is the wrong abstraction level. Each row of a vector data corresponds more with an 'item', and the vector dataset itself belongs at the 'collection' level. I've been working on fleshing the more 'generic' OGC collection that is the bit we'd need to fit vector into the 'stac world'. But STAC is really more about the search within a collection.

I explain this some more at https://github.com/cholmes/ogc-collection#stac-items-and-records and that repo has the thinking on it.

So what we want to get to is a definition of a 'dataset collection', which will include a collection of STAC Items (like 'landsat') as well as vector datasets (like 'tiger roads' in the US). And then a STAC API would be able to offer collection level search, implementing the OGC Records API. But in terms of core constructs I want to define the 'dataset collection' at a level 'above' STAC, and keep STAC focused on cataloging of 'assets'.

We have been making good progress with the OGC API Records group, and are now quite aligned on this path. They're working on refactoring their spec to help make the relationship clear. See https://github.com/opengeospatial/ogcapi-records/pull/129

m-mohr commented 2 years ago

@McSurf84 and others: We'll need some vector support as well in the future so I'm wondering what your requirements are and what is missing for you in the spec? Because simply linking to a let's say GeoPackage file is not an issue at all right now. So making it work for vector files in general doesn't seem to be an issue; it is probably just what some additional metadata lacking or how to structure it?

What we'd probably have in our case (openEO) is collections of e.g. "points of trees in Germany" or "admin boundaries" (that a user can load into the processing pipeline) or some vector data that we export from a processing pipeline (e.g. "zonal statistics for forests in Australia").

McSurf84 commented 2 years ago

The original question was about the fact that the FAQ does not recommend to store vector data in STAC. We had put this up for discussion in a STAC meeting and various ideas came up there about how something like this could perhaps be done. Since then, however, this has been neglected by us as well.

We are in the process of creating a new spatial data infrastructure for our state agency and would like to use STAC to compile an index of the available data. Of course, this also concerns vector data. Here it would be a good idea to group the geodata and not have every single feature reflect an item in STAC. To take up your idea of the points in trees, this could become otherwise very fast confusing. A grouping by parcels/urban areas or whatever would definitely be a possibility here.

The topic will get more attention in the near future, but we would also like to go into an exchange to share experiences and possibly also to have consistency with other users / STAC creators.

TomAugspurger commented 2 years ago

I don't know if it's relevant to this issue, but I'll share what we're doing on the Planetary Computer.

I've come at this from the point of view of tabular data, where some of the columns happen to contain (vector) geometries. From that point of view, STAC has been a natural fit:

In our case, the actual assets are links to parquet datasets (technically, the root of the parquet dataset in blob storage, which might be partitioned into many files). You could imagine modeling a spatially partitioned parquet dataset as a collection of items (each with their own geometry), but we haven't explored that yet.

The example notebook for US census gives a good overview of how STAC is used for data access.

We're relying on the table extension to catalog the columns available in each item: https://github.com/stac-extensions/table

m-mohr commented 2 years ago

Quick recap from a recent telco with LGLN:

Did I miss anything @McSurf84 ?

The new website should elaborate more on vector data. Currently, it says at https://stacspec.org/faq.html:

Q. I have vector data, should I use STAC? A. No. Vector data should be handled directly with WFS 3.

We found this to be pretty confusing (and outdated).

m-mohr commented 2 years ago

I recently changed this to:

Yes! Vector data can in principle be handled with STAC, but it's not as well defined as for raster data. STAC it closely aligned with OGC API - Features though and you should have a look at that specification, too.

The previous explanation was wrong and confusing and was the most cited thing from the STAC website that I've seen in the past.

m-mohr commented 1 year ago

A general nice guideline from @matthewhanson is:

Nevertheless, vector data (especially larger sets of geometries) is generally supported by STAC, e.g. as a geoparquet asset. The quesion is more how you expose/enrich the assets with metadata, for example with the table extension. That's a data format issue that probably needs separate issues per data format and/or per extension.