w3c / sdw

Repository for the Spatial Data on the Web Working Group
https://www.w3.org/2020/sdw/
148 stars 81 forks source link

New Project proposal: Dicing or partitioning Ontology for RDF Data Cubes #1068

Open 6a6d74 opened 6 years ago

6a6d74 commented 6 years ago

The RDF Data Cube specification supports 'slicing' across one dimension or more, thereby reducing the dimensionality of the data cube. Originally when the RDF ontology was proposed, the UN SDMX statisticians could not agree on a vocabulary for further sub-setting or summarizing.

At the OGC TC in March 2018, there was recognition that there was a commonality underlying many proposed big data cubes, geospatial data cubes, map tiles, vector tile sets, data partitions, result paging, etc.

With the OGC enthusiasm for the newer, more flexible, less schematic, more RESTful, Web Feature Service V3.0, there seems to be a push to review the entities that appear in various web services and generalizing them to use across a variety of services and APIs.

It appears to me that there are some very common patterns in data partitioning that could be re-used, especially if the concepts and terminology were refined; e.g. along one dimension, partition according to:

These patterns could be applied in 1D (timeseries), 2D (map tiles), 3D (Cesium), or more.

Chris Little sees this as being complementary and orthogonal to the QB4ST work.

Rob Atkinson said he previously had played with URL templates referencing QB components...

This proposal would support arbitrary service interfaces. The W3C DXWG work on profile descriptions might be a pathway to classifying such services. Documenting such services, and various subset relationships is important and not well supported, but possibly some of the DCAT work will help, but probably just recommend using an external vocabulary. So leveraging that to justify this new work makes sense.

QB metadata for subsets once transferred is another concern, but would be a use case for the same vocabulary.

6a6d74 commented 6 years ago

Request to SDW IG participants:

Please identify if you support this Project proposal - either as is or with amendments, and indicate whether you are keen/able to contribute effort.

Pending responses from the IG, this proposal may be promoted to a SDW IG Project.

6a6d74 commented 6 years ago

Chris Little and Rob Atkinson have already noted their support. Will you join them?

6a6d74 commented 6 years ago

Chris Little notes:

  1. The initial project document gave some simple examples. Tilesets/Levels of Detail give a different kind of instantaneous partitioning of a data cube.
  2. Assume that this will become, initially, a W3C effort, rather than Joint OGC-W3C, as not restricted to geospatial data.
  3. Who else is interested in taking this forward?
rob-metalinkage commented 6 years ago

Note that another effort looked at the general case - https://github.com/lorenae/qb4olap/wiki

This is not in the W3C canon.

One option is to worry about the spatio-temporal functions, consistent with QB4OLAP - but defined standalone with an alignment to it - (so no direct dependencies)

I think DGGS as a spatial dimension may also be important.

The UNGGIM stats discussion needs to be brought into the frame here.

So clearly, there is scope for fragmented, inconsistent approaches to propagate, unless a BP or enabling specification can emerge

lvdbrink commented 6 years ago

So clearly, there is scope for fragmented, inconsistent approaches to propagate, unless a BP or enabling specification can emerge

Would this fit into the Statistics on the Web BP then? @BillSwirrl

rob-metalinkage commented 6 years ago

I think there are two aspects - 1) describing statistics 2) describing service interfaces according to the operations they perform (and on what dimensions typically)

I dont think the latter is likely to surface strongly in the statistics on the web BP - maybe I'm wrong - but its a general issue around designing OGC services to be more web friendly - can we describe what they do?

these issues kind of come together when we start to look at distributions of datasets via services (supported explicitly by current revisions of DCAT) - and also how datasets and distributions relate (is one a slice of another)

I think its a solvable with a BP approach, but we need to first test some ideas and establish the BP :-)

chris-little commented 6 years ago

Thanks for these comments and links, rob-metalinkage and lvdbrink. I notice that the OLAP proposal does not seem to have been touched since 2015?

I am not sure that I understand their use of 'dice'. They seem to use it as an aggregation process over the levels of hierarchical dimensions. This does not fit with my naïve view of partitioning 'observations' along various non-hierarchical dimensions (pure QB) into practical chunks. Their proposal seems to automatically calculate the derived first order statistics for each such chunk, and use such statistics as proxies for the underlying data.

As rob-metalinkage says, there seems to be some orthogonality between calculating statistics and partitioning. the simple partitioning may not correspond to the grouping required for meaningful statistics.

tidoust commented 6 years ago
  1. describing service interfaces according to the operations they perform (and on what dimensions typically)

For reference, I note that for what @6a6d74 presents as "item count", there was some standardization work at W3C on a Linked Data Platform Paging spec, which was shelved for lack of implementors.

VladimirAlexiev commented 5 years ago

QB4OLAP is definitely worth investigating and extending if possible. "Not touched since 2015" is not enough reason to disregard it. I have the papers listed at https://github.com/lorenae/qb4olap/wiki/4)-Publications in case someone wants them.

QB4OLAP's hierarchical features need to be reconciled with the following:

6a6d74 commented 4 years ago

@chris-little ... not much happening here! Do you think that this work / concept can be incorporated into the OGC Data Tiles activity?

chris-little commented 4 years ago

@6a6d74 Many years ago, as we started the Met-Ocean extensions to OGC Web Coverage Service, I also tried to started work on a Web Coverage Tile Service in OGC. There was resistance at the time, and it went into abeyance. There has been a lot more work since, in OGC Interoperability Experiments and Testbeds, on tiling (2D and 3D), and this is now manifesting as input into the OGC API - Tiles standard.

Also, there is now a conceptual model for multi-dimensional tiling in the OGC pipeline, with a 2D concrete implementation extension Core Tiling Conceptual and Logical Models for 2D Euclidean Space. It has been out for public comment and is currently subject to an electronic vote for release as an underlying "Abstract Specification Topic".

There is no extension yet for 3D tessellations or for tiling that involves overlaps and gaps. It is not clear to me that any of the OGC 3D tiling work has a solid underlying conceptual model.

There is still a gap between these (real, actual) space tiling efforts and the idea of tiling/tessellating the abstract spaces of the RDF Data Cube Vocabulary, which would encompass paging/partitioning.

I do not think that the OGC data tiles activities will bridge the gap to ontologies.

What does @lieberjosh think?