elpaso commented 4 years ago

QGIS Raster Temporal Catalog Data Provider

Date 2020/08/10

Author Alessandro Pasotti (@elpaso)

Contact elpaso at itopen dot it

maintainer @elpaso

Version QGIS 3.18

Summary

Many kinds of earth observations or model outputs are provided as a (potentially huge) series of individual raster files, each file has a timestamp or a time range that provides information about the temporal validity of the data.

The original use case it for QGIS Server WMS-T but the proposed solution would also allow to expose a directory of raster layers as single temporally enabled raster in the desktop, to be used with the temporal controller.

For QGIS Server WMS-T, there are at least two valid reasons to expose the individual layer as a single temporally enabled layer, The main reason of exposing a single layer instead of all the individual is that it is required by the "OGC Best Practice for using Web Map Services (WMS) with Time-Dependent or Elevation-Dependent Data (1.0)" [1] which recommends to expose the layers as a single layer with a TIME dimension.

The other reason is that by serving each layer individually we would have a huge GetCapabilities response and cause severe performance issues.

Proposed Solution

This proposal consists in the implementation of a new raster catalog data provider that will expose a set of individual raster layers as a single logical raster layer. The catalog may contain child layers which differ in geographic extent, temporal range, or some other dimension. Conceptually, a raster catalog layer is very similar to a GDAL virtual raster vrt file, where many individual source rasters are exposed through QGIS as a single map layer. The approach outlined in this QEP differs from a GDAL vrt in that the individual child layers may not necessarily be a GDAL based backend, but could potentially be any type of raster layer possible in QGIS (including WMS, postgres raster, etc).

This proposal consists in the implementation of a new raster catalog data provider that will expose a set of individual raster layers as a single temporally-enabled raster layer. The temporal capabilities of the catalog layer will be automatically calculated and exposed in both the desktop and the server context (as WMS-T).

New classes:

QgsRasterCatalogLayer
QgsRasterCatalogDataProvider (abstract)

the data provider will initially implement a

QgsImageDirectoryRasterCatalogProvider for filesystem-based rasters such as TIFF

but the idea is to allow for further implementations like for example a GPKG raster catalog data provider that would serve GPKG rasters as a temporal series.

The catalog layer will internally keep track of the sublayers extents (geographic and temporal) and it will create one or more QgsRasterDataProvider that match the requested extents by delegating the raster interface calls to the sublayer's data provider (GDAL in case of an image directory, OGR in case of a GPKG etc. etc.).

Performance Implications

None

Further Considerations/Improvements

It is not yet clear what kind of system will be provided to automatically collect temporal metadata from the files, a regular expression to extract the timestamp from the file name will be probably enough for simple use case but it would be difficult to implement and to apply in case of temporal ranges where precision is also required. In order to handle all possible use cases the relevant API for adding layers to the catalog layer will be exposed to Python, so that customised plugins and scripts which directly handle a user's specific data structure and needs can be easily created. (e.g. if a user structures their layers in folders per year, then subfolder per month, then subfolders contain files named by the day and time, they could create a script which bulk populates a QgsRasterCatalogLayer with all these sublayers by a simple Python folder traversal) It is not yet clear what kind of system will be provided to automatically collect temporal metadata from the files, a regular expression to extract the timestamp from the file name will be probably enough for simple use case but it would be difficult to implement and to apply in case of temporal ranges where precision is also required.

There is also the possibility to extract metadata directly from the files but if this is a feasible option and for which formats has yet to be investigated.

Finally, UI could be added in future to allow users to manually add individual sub layers (or selections of layers) from the file system to a catalog layer, and manually populate the geographic extents and time ranges for these. This UI work is considered out-of-scope for the current QEP, which will add the raw API for populating catalog layers only.

A new QgsMapLayerRenderer subclass will be created, QgsRasterCatalogLayerRenderer. This renderer will use the information present in the QgsRenderContext in order to selectively decide which component child layers from the catalog should be rendered. E.g. if the catalog contains multiple layers with a geographic extent matching the render context's extent, then the temporal range of the render context will be used to select the appropriate sub layers to render. Sub layers which fall outside of the render context's geographic extent will not be rendered.

QgsRasterCatalogDataProvider will provide a similar method to QgsRasterDataProvider for retrieving blocks of raster data. These blocks can then be passed to the existing raster pipeline and rendering classes in order to re-use all the existing raster layer renderer and pipeline filters

Backwards Compatibility

None

Votes

(required)

[1] https://portal.opengeospatial.org/files/?artifact_id=56394 https://portal.opengeospatial.org/files/?artifact_id=56394

nyalldawson commented 4 years ago

I like where this is going!

I think the approach should be generalized though so that it potentially covers (in future) more than just the temporal use case. I.e.

This proposal consists in the implementation of a new raster catalog data provider that will expose a set of individual raster layers as a single temporally-enabled raster layer.

This proposal consists in the implementation of a new raster catalog data provider that will expose a set of individual raster layers as a single logical raster layer. The catalog may contain child layers which differ in geographic extent, temporal range, or some other dimension. Conceptually, a raster catalog layer is very similar to a GDAL virtual raster vrt file, where many individual source rasters are exposed through QGIS as a single map layer. The approach outlined in this QEP differs from a GDAL vrt in that the individual child layers may not necessarily be a GDAL based backend, but could potentially be any type of raster layer possible in QGIS (including WMS, postgres raster, etc).

New classes: QgsRasterTemporalCatalogLayer QgsRasterTemporalCatalogDataProvider (abstract) the data provider will initially implement a

QgsRasterCatalogLayer QgsRasterCatalogDataProvider (abstract)

QgsImageDirectoryRasterTemporalCatalogProvider for filesystem-based rasters such as TIFF but the idea is to allow for further implementations like for example a GPKG raster catalog data provider that would serve GPKG rasters as a temporal series.

QgsImageDirectoryRasterCatalogProvider but the idea is to allow for further implementations like for example a GPKG raster catalog data provider that would serve the whole raster contents of a GPKG database as a catalog layer, or a table containing Postgis raster layers as a catalog layer.

It is not yet clear what kind of system will be provided to automatically collect temporal metadata from the files, a regular expression to extract the timestamp from the file name will be probably enough for simple use case but it would be difficult to implement and to apply in case of temporal ranges where precision is also required. There is also the possibility to extract metadata directly from the files but if this is a feasible option and for which formats has yet to be investigated.

Can I suggest:

"It is not yet clear what kind of system will be provided to automatically collect temporal metadata from the files, a regular expression to extract the timestamp from the file name will be probably enough for simple use case but it would be difficult to implement and to apply in case of temporal ranges where precision is also required. In order to handle all possible use cases the relevant API for adding layers to the catalog layer will be exposed to Python, so that customised plugins and scripts which directly handle a user's specific data structure and needs can be easily created. (e.g. if a user structures their layers in folders per year, then subfolder per month, then subfolders contain files named by the day and time, they could create a script which bulk populates a QgsRasterCatalogLayer with all these sublayers by a simple Python folder traversal)

There is also the possibility to extract metadata directly from the files but if this is a feasible option and for which formats has yet to be investigated.

Finally, UI could be added in future to allow users to manually add individual sub layers (or selections of layers) from the file system to a catalog layer, and manually populate the geographic extents and time ranges for these. This UI work is considered out-of-scope for the current QEP, which will add the raw API for populating catalog layers only. "

Lastly, I think you should add a note about rendering:

I.e. something like: "a new QgsMapLayerRenderer subclass will be created, QgsRasterCatalogLayerRenderer. This renderer will use the information present in the QgsRenderContext in order to selectively decide which component child layers from the catalog should be rendered. E.g. if the catalog contains multiple layers with a geographic extent matching the render context's extent, then the temporal range of the render context will be used to select the appropriate sub layers to render. Sub layers which fall outside of the render context's geographic extent will not be rendered."

"QgsRasterCatalogDataProvider will provide a similar method to QgsRasterDataProvider for retrieving blocks of raster data. These blocks can then be passed to the existing raster pipeline and rendering classes in order to re-use all the existing raster layer renderer and pipeline filters"

elpaso commented 4 years ago

@nyalldawson I applied your suggestions, thank you!

I added temporal because that will be the only implementation for now but I see the potential in a more generic approach, the immediate use case for WMS would be ELEVATION dimension, even if we don't have an elevation controller like the temporal controller.

By limiting to the temporal dimension it would be more straightforward to reuse the same logic that I've been using in PG raster (that does not require a renderer class: everything is handled by the provider directly through temporalCapabilities()->requestedTemporalRange()) even if that is a kind of hybrid provider that has some "vector" capabilities.

nyalldawson commented 4 years ago

I added temporal because that will be the only implementation for now but I see the potential in a more generic approach, the immediate use case for WMS would be ELEVATION dimension, even if we don't have an elevation controller like the temporal controller.

I totally get that -- I'm just wanting the class names and "general" API decisions to be kept as generic as possible so that in future we can extend this class without having to make another QgsMapLayer subclass.

whatnick commented 4 years ago

We at DE Australia maintain a large temporal catalog of imagery exposed via WMS-T here. We use a postgresql index over data stored in S3 via OpenDataCube. There seems to be large overlap in implementation intention here and I am happy to contribute and collaborate and possibly see some of the opendatacube concepts such as cogs via Blob store in addition to file stores reused.

I would love to see a similar concept implemented in QGIS using a set of rasters on a filesystem with space-time indices in a file based database (sqlite) for simplicity. The spatio-temporal metadata for the files can be stored in STAC sidecars in addition to the filename to be more standard compliant. The per raster JSON files would indexed into the said embedded DB during layer initialization.

qgis / QGIS-Enhancement-Proposals