opengeospatial / ogcapi-geodatacubes

Other
4 stars 1 forks source link

Definition of a GeoDataCube #12

Open RyanAhola opened 1 month ago

RyanAhola commented 1 month ago

Building on recent discussion in Testbed-20 (https://gitlab.ogc.org/ogc/T20-GDC/-/issues/14), setting up a thread to discuss what the definition of what a "geodatacube" is. Goal is for the SWG to come up with a definition that can be referenced by OGC.

m-mohr commented 1 month ago

In the following you can read the current write-up from the OGC GitLab. It is based on and a short version of https://openeo.org/documentation/1.0/datacubes.html and if clarifications are needed, it's the best source to check.


GeoDataCubes

Datacubes are multi-dimensional arrays with additional information about their dimensionality. Datacubes can provide a nice and tidy interface for spatiotemporal data as well as for the operations you may want to execute on them. Although arrays are close to raster data, datacubes can also hold vector data as well. GeoDataCubes (GDC) are a special case of datacubes in that they have one or multiple spatial dimension, e.g. x and y. GeoDataCubes for raster data often consist of the dimensions x, y, time and bands. Sometimes they also have multiple temporal dimensions. GeodataCubes for vector data often consist of geometries, time and a variable. Generally, datacubes can consist of any combination of dimensions - the dimensions are unrestricted. The spatial dimension of GeoDataCubes may get removed during processing.

The following additional information are usually available for datacubes:

These additional information could be provided upfront via metadata.

Dimensions

A dimension refers to a certain axis of a datacube. This includes all variables (e.g. bands), which are represented as dimensions. An exemplary raster datacube could have the spatial dimensions x and y, and the temporal dimension t. Furthermore, it could have a bands dimension, extending into the realm of what kind of information is contained in the cube.

The following properties are usually available for dimensions:

Specific implementations of datacubes may prescribe details such as sorting orders or representations of labels. For example, some implementations may always sort temporal labels in their inherent order and encode them in an ISO8601 compliant way.

Datacubes contain scalar values (e.g. strings, numbers or boolean values), with all other associated attributes stored in dimensions (e.g. coordinates or timestamps). Attributes such as the CRS or the sensor can also be turned into dimensions. Be advised that in such a case, the uniqueness of pixel coordinates may be affected. When usually, (x, y) refers to a unique location, that changes to (x, y, CRS) when (x, y) values are reused in other coordinate reference systems (e.g. two neighboring UTM zones).

Common Operations

A couple of operations are commonly applied to datacubes:

Every operation that returns a subset of the datacube or the complete datacube is considered to be datacube access.

Every operation that is computing new values is considered to be datacube processing.

Comparison

Coverages

Further information: https://github.com/Open-EO/openeo-api/pull/502

openEO

xarray

A datacube as described here is closely related to the concept of a single xarray DataArray.

netCDF

A datacube is comparable to a netCDF variable with its dimensions.

Raster file formats

strobpr commented 3 weeks ago

Based on what is discussed here and previously in https://gitlab.ogc.org/ogc/T20-GDC/-/issues/14 and #502, I started wondering whether the attempt to find a single definition for all these different incarnations of (geo)datacubes is at all possible. Maybe the only commonality is that a ‘(geospatial) Datacube’ stands for the desire to render a multitude of (geospatial) data interoperable and organise them such that working with them as an ensemble is more efficient than individually. This is of course too undetermined to build a good definition on it which could help to distinguish what is considered in and what out. Settling with that type of loose agreement would mean to renegotiate the term each time a concrete project is started (as seems here the case). This does not sound very efficient either.

A possible way out could be to understand ‘(geo)datacubing’ as a process with several stages which render (geospatial) data increasingly more organised and interoperable, such enhancing the efficiency to deal with them. Below is what that could look like (6 stages only because the analogy to cube faces). I would hope agreeing on certain ‘datacube stages’ might be easier than reserving the name just for one or from a specific stage.

Curious to hear other opinions, maybe it's just too hot an August afternoon here.

Stage
Description
Notes
1
Multitude of data which have sufficient metadata to allow ordering them along certain dimensions
2 Multitude of data which have declared dimensions to which all single data items can be referenced
3
Multitude of data which are referenced to more than one standardized dimension (one of them being a geospatial domain)
At this stage, we have essentially a point cloud in an established CRS
4 Multitude of data block-wise co-registered (aligned) along at least one identified standardised dimension with all blocks sharing a common geospatial range This stage marks the forming of layers or coverages which can be ordered and show a geospatial overlap
5
All layers are co-gridded to a regular grid system
At this stage, all data are organized in layers sharing the same grid or grid system (Q: Are the layers supposed to be gap-free?)
6 All layers have homologous discretisation (‘gridding’) along all their declared dimensions At this final (ideal?) stage, the dimensions follow the same algorithmic set of rules, so that operations can equally be applied across all dimensions or domains

Applicable definitions: Data Value and (usually) uncertainty of a trait of a specific entity

Dimension direction or aspect in which a trait can vary or be measured (a single type domain)

Domain n-dimensional space created by individual dimensions

Standardised dimension Dimension with a standardised (ISO, OGC, SI) reference system (Q: needs to have an axis?)

Layer A multitude of data in which all items share at least one metadata value (e.g. being on the earth surface or constant elevation)

Value state of a trait within a class or type (domain)