opengeospatial / ideas

Public repository for Innovation Program Ideas
20 stars 3 forks source link

Advance Data Cubes #1

Open bermud opened 7 years ago

bermud commented 7 years ago

Several activities exist related to Data Cubes

Work to be advanced

pebau commented 7 years ago

Trying to provide some prose here: Datacubes form an enabling paradigm for serving massive spatio-temporal Earth data in an analysis-ready way. Extending the concept of seamless maps from 2D into nD by including height and time (and maybe more axes) presents users (and m2m clients alike) with a small number of homogenized objects, thereby easing access, extraction, analysis, and fusion substantially - "one cube says more than a million images". For server-side evaluation of datacube requests, a bundle of enabling techniques is known which can massively speed up processing, including adaptive partitioning, parallel and distributed processing, dynamic orchestration of heterogeneous hardware, and even federations of data centers. Today, known datacube services exceed 500 TB, and datacube analytics queries have been split across 1,000+ cloud nodes.

From a standards perspective, datacubes belong to the category of coverages; the coverage data model is represented by OGC Coverage Implementation Schema (CIS), the service model by OGC Web Coverage Service (WCS) together with its OGC Web Coverage Processing Service (WCPS), OGC's geo datacube query language.

In Testbed-14, 3D x/y/t and 4D x/y/z/t spatio-temporal datacubes, exceeding 1 TB to convincingly demonstrate scalability, should be served via WCS and WCPS. The following features should be available and demonstrated:

This service functionality should be demonstrated through a wide range of clients (via WCS as the client/server API, maybe additionally WMS) facilitating visual navigation (eg, OpenLayers), Web GIS integration (eg, QGIS, ArcGIS), virtual globe display (eg, NASA WorldWind, Cesium), and programmatic analytics (eg, python, R).

pebau commented 7 years ago

...plus, there should be application/sponsor specific scenarios, of course. Candidates that come to my mind:

tomLandry commented 7 years ago

For application/sponsor specific scenarios, climate data would fit the bill too. NetCDF files served through OPeNDAP are effectively data cubes too. Having a uniform way to query, subset and regrid climate model outputs and EO data would simplify the infrastructure required to serve both. This is a use case that CRIM envision inside Earth System Grid Federation (ESGF). Reference: Geospatial Fusion in the Era of Big Data, George Percivall (OGC), IGARSS, July 24th 2017.

pebau commented 7 years ago

Actually, in an earlier testbed a WCS interface has been established for OPeNDAP.

pebau commented 4 years ago

FWIW, in Testbed-15 a datacube federation between EU and US has been established, with complete location transparency, based on rasdaman (also used in EarthServer where ultimately 2.5+ PB of satellite and climate data have been datacubed).