qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
118 stars 37 forks source link

Database metadata storage #250

Open elpaso opened 2 years ago

elpaso commented 2 years ago

QGIS Enhancement: Database metadata storage

Date 2022/05/02

Author Alessandro Pasotti (@elpaso)

Contact elpaso at itopen dot it

maintainer @elpaso

Version QGIS 3.28

Summary

QGIS can currently load QMD metadata from a sidecar file when loading a layer with the OGR data provider, it is not currently possible to automatically load metadata from a DB layer. Metadata are also exposed in the QGIS browser for OGR layers.

By storing metadata in a DB table in a dedicated table, it will be possible to extend this capability to all client-server DB data providers.

Preliminary note: for "DB" we meanly intend client-server DBs like postgres and oracle, GPKG has its own metadata tables that won't be affected by this implementation. Spatialite is not in scope either.

The main goal is to provide a centralized way to search and deliver layer metadata that can be explored and automatically loaded into QGIS using the internal QMD metadata format.

Use case 1: automatic loading of metadata when loading a layer

When a layer is loaded from a DB, metadata are automatically imported into QGIS.

Use case 2: store metadata into the DB

When metadata are compiled from within QGIS, it will be possible to store them in the same database the layers belongs to.

Use case 3: browse metadata from the QGIS browser

When browsing DB layers from the QGIS browser it will be possible to see the metadata that are stored in the DB

Use case 4: filter/search layers by metadata

The QGIS browser offers filtering capabilities, the filter system could be extended to search into the metadata table in the stored DB connections, filtering by extent and CRS will also be permitted by the API.

Proposed Solution

The implementation will be similar to the current implementation that stores QML styles into the DB.

A dedicated table will be created when the user chooses to store metadata into the DB, the table will contain a small set of metadata as individual columns (for example: identifier, description, extent etc.) and the full metadata QMD XML as an XML field (if supported by the DB) or in a text field.

This will enable use cases 1, 2 and 3.

For filtering and searching (use case 4) we propose an approach based on the current QgsAbstractDatabaseProviderConnection API: that API represents a DB connection and it is the ideal place where the metadata could be attached to a TableProperty to deliver information about table (layer) metadata.

To make the API more flexible for client usage from the QGIS application, a QgsLayerMetadataProvider (and a companion QgsLayerMetadataProviderRegistry class will be developed), the layer metadata provider class will make it possible to perform searches across different metadata provider implementations, individual data providers will provide implementations for their own metadata allowing to search for available layers by metadata.

Searching and filtering will be initially implemented by searching in the individual columns subset, further enhancements will possibly provide full filtering capabilities by using xpath search on the QMD XML.

Example(s)

registry = QgsLayerMetadataProviderRegistry.instance()
# Additional metadata providers will be implemented in python, an optional list of provider keys will allow  
# for filtering,  a metadata provider will be allowed to provide metadata for multiple (or any) data providers
registry.registerLayerMetadataProvider( MyMetadataProvider(), [ "ogr", "spatialite", "WFS" ] )
# This search will limit the search to "description" + "identifier" metadata fields and "postgres" + "mssql" providers
# my extent will contain an optional QgsReferencedRectangle for extent filtering, a list of CRSs will
# be also allowed.
# Note: we might decide that it is better to provide all filtering criteria in a struct instead of individual values, or
# maybe provide both APIs.
results = registry.search( "my search string", [ QgsLayerMetadataScope.Description, QgsLayerMetadataScope.Identifier ], [ "postgres" ,"mssql" ], myExtent )
# Search results will contain full metadata information about the layer  ( ``QgsLayerMetadata`` object) and the the data source 
# URI to load it, including the data provider key.

Affected Files

The DB connections API will be extended to provide methods to store/retrieve layer metadata into the DB.

A global QgsLayerMetadataProviderRegistry singleton will manage the metadata providers.

We haven't yet scoped out the GUI/UX changes for metadata filtering/searching in the browser, initially we will probably implement it as an extension of the existing QgsBrowserProxyModel, that will do a text search on the subset of metadata fields.

Performance Implications

We might expect some delay when retrieving table information for DB data provider layers in the browser because the metadata will need to be fetched from the DB, the current implementation already executes these queries in a separate thread so no GUI/UX slowdown is expected.

Backwards Compatibility

None

Votes

(required)

elpaso commented 2 years ago

CC: @timlinux, @nyalldawson

jakimowb commented 2 years ago

This registry-based solution could allow to add a provider that gives access to the GDAL metadata model (https://gdal.org/user/raster_data_model.html#metadata).

@elpaso doesn't the search need a layer reference as well? registry.search(<myLayer>, "my search string", [ QgsLayerMetadataScope.Description, QgsLayerMetadataScope.Identifier ], [ "postgres" ,"mssql" ], myExtent )

elpaso commented 2 years ago

This registry-based solution could allow to add a provider that gives access to the GDAL metadata model (https://gdal.org/user/raster_data_model.html#metadata).

Yes, absolutely.

But keep in mind that this functionality is already available in QGIS (not for the search/filter): if you browse or load a filesystem-based GDAL/OGR dataset that has metadata available they will be loaded into QGIS and available in the layer properties.

The gap I'm trying to fill here is that there is currently no way to distribute metadata for datasets coming from postgres (vector or ranster) or from any other client-server DB provider.

@elpaso doesn't the search need a layer reference as well? registry.search(<myLayer>, "my search string", [ QgsLayerMetadataScope.Description, QgsLayerMetadataScope.Identifier ], [ "postgres" ,"mssql" ], myExtent )

No, I don't think so, once you have a layer you already have the metadata associated to the layer (they will be automatically loaded from the DB storage if they are there): the search functionality is meant to find layers by their metadata given a search criteria, for example:

  1. find/filter all layers where metadata description contains the string "vegetation" and the extent crossed the current canvas extent.
  2. find/filter all layers where metadata keyword contains "agriculture" ...
timlinux commented 2 years ago

Huge thumbs up for this from me - it will be another meaningful step forward in the support of metadata in QGIS.

It might also be good to explicitly state that the proposal will not support other metadata schemes such as INSPIRE/USGS/ISO19115 etc. - these will still need to be generated by third party implementations such as I believe the GeoCat and similar plugins offer.

tomkralidis commented 2 years ago

+1. Agree to use a core model that "external to core" implementations can provide serializers for.

Gustry commented 2 years ago

Thanks, this looks a nice idea, it's indeed a wanted feature.

As a side note, there is the PgMetadata plugin in QGIS. It allows to store metadata for vector/raster layers stored in a Postgis database. Then the users can search for layers using the QGIS Locator bar

Metadata can be exported as a DCAT catalog, PDF or HTML. All the logic for generating the DCAT, HTML is done in SQL functions, with locales etc.

During the development of this plugin, we have followed the internal QGIS schema for metadata. We still want to integrate it with the native metadata panel : https://github.com/3liz/qgis-pgmetadata-plugin/issues/24

This comment is just for keeping reference and also showing some examples : QGIS locator, SQL schema used in PG etc :)

elpaso commented 2 years ago

@Gustry thank you, I'm aware of the PgMetadata plugin, it's indeed a very nice implementation.

nyalldawson commented 2 years ago

Minor thing:

registry = QgsLayerMetadataProviderRegistry.instance()

that should instead be

registry = QgsApplication.layerMetadataProviderRegistry()

like all the other quasi-singletons we have. Otherwise big +1 from me!

KoalaGeo commented 1 year ago

Is this work still ongoing to support Oracle?

elpaso commented 9 months ago

Is this work still ongoing to support Oracle?

not at the moment.

It wouldn't be particularly hard to implement it though.