Closed rouault closed 8 months ago
@rouault interesting discussion!
While working on https://github.com/qgis/QGIS-Enhancement-Proposals/issues/257 (a SensorThings provider) , I've been wondering if we need some generic / cross provider approach to handle feature relationships in a more flexible way. Currently relationships in QGIS (as you've noted) are tied heavily to the concept of individual layers, but this doesn't translate well to data models like you've described here or the SensorThings model where the relationship structure is more complex and tightly tied into the data model itself.
Another example would be handling "attachments" from an ArcGIS REST server. These aren't available as a standard layer, and indeed can't even be retrieved in bulk. Rather you need to call a specific "get attachments" API per feature that you want to retrieve the attachments for. Again, this setup doesn't fit in well with the current layer based approach to relationships.
So I'm wondering if we shouldn't use this opportunity to instead develop a "get related features" concept within the vector data provider itself, which would allow for interactive feature-based (and nested) related feature retrieval.
I'm keen to hear your thoughts... it's quite a different approach to the "flattening" technique you've proposed here, but might ultimately be a better fit for this data model...
a "get related features" concept
In the case of GML complex features, a property can indeed be sometimes a reference to another object through a URL, but quite often, it is just inline nested XML content without necessary an identifier (as gn:GeographicalName in ps:siteName in my above example), and such content doesn't really qualify for being called a feature and is already returned in the stream of the main feature. I'm not clear where the data obtained from "get related features" would land: in a dedicated QGIS layer (probably a in-memory one, although I can imagine users would want to see serialization of related features they would have resolved) for each related attribute ? The idea makes sense, but I don't have the bandwidth to implement that on top of the work exposed in this QEP. If such a "get related features" capability was developed, I could see that as a potential enhancement that could potentially be leveraged by the WFS provider on top of what I'm proposing. Dealing with related features could potentially involve other solutions, like developing a QGIS processing tool, that would resolve related features of already loaded features (although there could be a risk of making the remote server struggle with a big number of requests) In short, this QEP aims at providing an initial integrated capability of WFS complex features, with an experience similar to simple features, for fields that are simple, and for complex content, the proposed QGIS expression function should provide a initial way of making use of that information.
A few years ago I was putting together a proposal to better handle INSPIRE data in QGIS and I was leaning towards the GMLAS approach, where the GML file would be exploded into many tables (spatial and non-spatial) that get loaded in QGIS as layers, and then relationships between the layers would be automatically set up.
But given the complexity of the problem, probably we may end up having both approaches - one that is proposed here (with nested XML) and another one with multiple related tables. I guess the former approach is better for simpler data models, the latter is better for complex models with lots of 1:N and N:M relationships. The GMLAS plugin for QGIS also offers both options: https://brgm.github.io/gml_application_schema_toolbox/usage/read_files.html
It would be great if it would be possible to also just drag'n'drop a GML file with complex features to get this functionality - I have seen quite a few broken WFS 2.0 servers (running out of memory, timing out, returning bad results) that it sometimes makes sense to just download GML data when possible and avoid WFS altogether :smile: - but not sure if it will be easy if the logic would be in the WFS provider code?
By the way, the use of XQuery will need xmlpatterns
Qt module which I believe we do not require yet, but probably it is already included in the installers, so that should not be a big deal.
+1 from me on this QEP
@nyalldawson as for your "get related features" concept - how would be that different from the existing concept of relations in QGIS, and the discoverRelations() provider call?
It would be great if it would be possible to also just drag'n'drop a GML file with complex features to get this functionality
Everything is doable, but I'm afraid that would complicate the WFS provider in non-trivial & inelegant ways. Plus we would need to hack around the OGR provider to pass such files to the WFS one.
By the way, the use of XQuery will need
xmlpatterns
Qt module which I believe we do not require yet, but probably it is already included in the installers, so that should not be a big deal.
Actually @pathmapper pointed to me that qtxmlpatterns was unfortunately deprecated in Qt 5.13 (https://doc.qt.io/qt-5/qtxmlpatterns-index.html) and has been removed in Qt 6.0. After looking around, I'm leaning towards using libxml2 which has XPath 1.0 capabilities and is a widely available dependency, used for example by most GDAL builds (and I see I've used libxml2 XPath in the past in GDAL, for unrelated purposes)
A most welcome initiative! I'd like to suggest, however, to serialize the content of complex attributes as JSON rather than XML if possible. This would
dispense the need to introduce XPath capabilities
allow to leverage the already available data extraction capabilities in QGIS expressions like subscripting or functions such as array_contains, e.g.:
provide improved readability with the JSON View widget:
be in line with the general tendency to drop XML in favor of JSON, allowing users to focus on and become familiar with just one common format
@kraftto
I'd like to suggest, however, to serialize the content of complex attributes as JSON rather than XML if possible. This would
I indeed considered exposing complex content as JSON in my initial analysis, but my counter arguments are:
if one day one wants to implement the transactional part, it will be much easier to do with XML content rather than with JSON that might be difficult to remap correctly to the expected XML schema
Another issue is that the QGIS editor for JSON fields is seriously limited to apparently only dictionaries of key/value pairs of simple types, and not arbitrarily nested content. I'm not sure what is the exact typing of the "properties" field in your above screenshot, but I suspect it is a string field and not a QVariant::Map.
UPDATE: I now see in Layer Properties / Attributes Form, that one should select "JSON View" as the widget type to get the output of your screenshot, instead of "Key/Value"! Grrr, I stumbled upon that issue for some time. ==> to anyone looking at this, should we change the default widget type for QVariant::Map to "JSON View" to avoid the issue with truncated content of the default "Key/Value" widget ?
So given the above, my point about using XML is much weaker, as transactional support for complex features is likely a marginal use case, and JSON could be a reasonable choice
but I suspect it is a string field and not a QVariant::Map
It is a JSONB Postgres field, which is adequately assigned the JSON View widget, but there seems to be an issue in that regard with Map type subtypes.
I get the point regarding WFS-T, but I do agree that it's unlikely to be implemented any time soon, so it might be superseded by a transactional OGC API Features by then.
I've updated the description to reflect that complex XML content will be exposed as JSON
@wonder-sk
as for your "get related features" concept - how would be that different from the existing concept of relations in QGIS, and the discoverRelations() provider call?
The current framework for relations is completely dependent on the participants in the relation being representable as vector layer themselves. This constraint breaks the use case of relationships where there is no fixed structure that can be represented as a child vector layer. I'm thinking here of cases where the children have completely different fields depending on which parent feature they have.
Or, in my immediate use case, for the situation where we can't efficiently construct the child as a vector layer. The ArcGIS REST API only permits querying related media feature-by-feature, so in order to construct a vector layer of related media for the children we'd have to do something very bad like iterating over all the parent features and firing off a get children request for each individually.
What I am thinking is something like:
struct RelatedFeature
{
//! Child feature, containing direct attributes and geometry for the child
QgsFeature feature;
//! Map of child features belonging to this feature, where map keys are relation identifiers
QMap< QString, QList< RelatedFeature > > children;
}
class QgsVectorDataProvider
{
...
/**
* Retrieves related child features for the parent feature with the specified \a id.
*
* The keys in the returned map are the relation identifiers.
*/
QMap< QString, QList< RelatedFeature > > queryRelatedFeatures( QgsFeatureId id );
...
}
QGIS Enhancement: Support for Complex Features in WFS provider
Date 2023/11/13
Author Even Rouault (@rouault)
Contact even.rouault at spatialys.com
Maintainer @rouault
Version QGIS 3.36 or 3.38
Sponsored by QGIS-DE user group (QGIS Anwendergruppe Deutschland e.V.)
Summary
The QGIS WFS provider (WFS client) supports WFS 1.0, 1.1 and 2.0 protocols, but it is currently restricted to consuming features returned as GML simple features, that is features that have attributes of simple pre-defined types (string, integer, floating-point number, datetime). While WFS servers serving GML simple features are common, there are various data models around that don't fit into that model and use more complex GML schemas, where feature properties can be repeated or be made of nested XML constructs. This is typically the case for Inspire. Currently, QGIS rejects such WFS layers. While there are workarounds, using the WFS 2.0 client plugin or the QGIS GML Application Schema Toolbox plugin, they require the user to be aware of their existence, and don't provide the same level of user experience than when working with a simple feature WFS layer. Hence this proposal of enhancing the existing WFS provider to be able to deal with complex feature schemas. The implementation will expose properties of complex types as JSON content, converted from XML.
Proposed Solution
Let's state it in a straighforward way: dealing with WFS/GML complex features is a pain and based on past experience developing the OGR GMLAS - Geography Markup Language (GML) driven by application schemas - driver I don't believe there is an ideal solution. The solution proposed here is both a compromise from an implementation and usability point of view. The choice of exposing complex properties as a JSON serialized string from nested XML is what makes it possible to use the existing infrastructure of the WFS provider. The other alternative that could be imagined, and which was taken by the OGR GMLAS driver, would have been to expose a (complex) relational data model of many simple feature tables with lots of relationships between them. It woulnd't fit at all as a QGIS provider, as a QGIS provider is bound to a single layer, and such output is not necessarily easy to manipulate by end users. Hence the QGIS GML Application Schema Toolbox plugin is not totally deprecated by the enhanced WFS provider, in use cases where a relational view of the data model is needed.
Concretely, the QgsWFSProvider::readAttributesFromSchema() method which analyzes the XML schema returned by the WFS DescribeFeatureType request on the layer of interest will be modified to fallback to the OGR GMLAS driver when the schema is not natively understood by QGIS (that is, it is not simple features). More precisely, only the schema analysis capability of the GMLAS driver will be used, by querying the special
_ogr_fields_metadata
layer which returns the properties of a FeatureType and their nature (if they are simple or complex, their cardinality). To be noted too that the GMLAS driver automatically "flattens" the data structure, even when the content is nested, but when the constraints schema show that there is a 1:1 cardinality (cf below example where ps:inspireID/base:Identifier/base:localId can be flattened asinspireid_identifier_localid
). The QgsGmlStreamingParser class will be modified to use the hints (mapping between XPath and QGIS fields) provided by the GMLAS schema analyzer to identify properties. A QgsXML class will be added to offer the translation from XML to equivalent JSON.For server-side filtering of simple content that is not at the first level of nesting, QgsOgcUtils::expressionToOgcExpression() and QgsOgcUtils::SQLStatementToOgcFilter() will need to be enhanced to map QGIS field names to proper XPath.
The first time the OGR GMLAS driver needs to access a remote schema, it downloads its content and caches the file locally. However with complex schemas, this can involve downloading several tens of files in a cascaded way, which can take several tens of seconds depending on network and server speed. The OGR GMLAS driver will be modified to allow cancellation of the downloading of files, and the QGIS WFS provider, when invoked from QGIS graphical user interface, will cause the OGR GMLAS driver to be invoked in a background task that can be canceled to avoid blocking the user interface. The OGR GMLAS driver will also be modified to use the the pluggable CPL networking capability of CPLHTTPPushFetchCallback(), to be able to use QGIS networking infrastructure to download the XML schemas.
The WFS provider automated tests will be enhanced to test for the above mentioned (non UI-based) changes/enhancements.
Example
Let's look at the output of a Inspire data model: https://inspire.brandenburg.de/services/ps_schutzg_wfs?service=WFS&version=2.0.0&typename=ps:ProtectedSite&request=getfeature&count=1
Querying the structure of the protectedsite layer with the OGR GMLAS driver yields:
The layer fields exposed by QGIS will be:
Affected Files
In QGIS:
In GDAL:
Limitations
Performance Implications
No measurable performance hit expected on simple features WFS layers.
For complex features WFS layers, an initial delay of several seconds the first time a new schema is read is expected.
Backwards Compatibility
There should be no backward compatibility issues as this is a new feature.
Issue Tracking ID(s)
#27076 - WFS 2.0 complex features not supported #52227 - Error loading WFS
Votes
To be done