qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.36k stars 2.98k forks source link

Wrong WFS 2.0.0 request returns duplicate features #41087

Closed Fuzl closed 3 years ago

Fuzl commented 3 years ago

Describe the bug

The WFS 2.0.0 prvoider will send out requests with double TYPENAME(S) like this : &TYPENAMES=Seattle_Downtown_Features:Zoning&TYPENAME=Seattle_Downtown_Features:Zoning, which results in features beeing doubled in the response (at least by an ArcGIS Server). In the example below the layer holds 97 features and they are sent back twice in the response.

For the same layer and server, in WFS 1.1.0 the request has only &TYPENAME=Seattle_Downtown_Features:Zoning and works. Also if I remove the additional &TYPENAME=Seattle_Downtown_Features:Zoning from the WFS 2.0.0 request, the duplicates will be gone.

ArcGISServer could deal better with this and unduplicate the request, but QGIS seems to be making the mistake first. I understand from various issues that the compatibility between different WFS versions and different client and servers is a challenge, and I guess that QGIS aimed at keeping both TYPENAME(S) for backward compatibility. It seems to be complicated to keep everyone happy !

I guess the best solution would be to unduplicate TYPENAME(S) keys in QGIS before sending the request - but am not knowledgeable enough to submit a PR.

I wouldn't want to finish without a big Thank You and Bravo for the excellent work !

How to Reproduce

  1. Add a WFS 2.0.0 layer with the URL : https://dservices2.arcgis.com/ZQgQTuoyBrtmoGdP/arcgis/services/Seattle_Downtown_Features/WFSServer?SERVICE=WFS
  2. Connect to the server and add the Zoning layer which holds 97 features (currently)
  3. With the network debugging panel (F12) fetch one of the request and it should show something like :
    https://dservices2.arcgis.com/ZQgQTuoyBrtmoGdP/arcgis/services/Seattle_Downtown_Features/WFSServer?SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=Seattle_Downtown_Features:Zoning&TYPENAME=Seattle_Downtown_Features:Zoning&STARTINDEX=0&COUNT=300&SRSNAME=urn:ogc:def:crs:EPSG::3857 I tweeked the STARTINDEX to zero and set a large COUNT to see all the response. This request in my browser shows twice the 97 features.
  4. Delete the &TYPENAME=Seattle_Downtown_Features:Zoning (and leave the TYPENAMES one) and send again. Now we only have 97 features.

QGIS and OS versions

Version de QGIS 3.16.3-Hannover Révision du code 94ac9f21b8
Compilé avec Qt 5.11.2 Utilisant Qt 5.11.2
Compilé avec GDAL/OGR 3.1.4 Utilisé avec GDAL/OGR 3.1.4
Compilé avec GEOS 3.8.1-CAPI-1.13.3 Utilisé avec GEOS 3.8.1-CAPI-1.13.3
Compilé avec SQLite 3.29.0 Fonctionne avec SQLite 3.29.0
Version du client PostgreSQL 11.5 Version de SpatiaLite 4.3.0
Version de QWT 6.1.3 Version de QScintilla2 2.10.8
Compilé avec PROJ 6.3.2 Fonctionne avec PROJ Rel. 6.3.2, May 1st, 2020
Version de l'OS Windows 10 (10.0)
Extensions Python actives Generalizer3; Go2NextFeature3; go2streetview; Highlighter; openlayers_plugin; PluginLoadTimes; plugin_reloader; qfieldsync; db_manager; processing

Additional context

I found quite some information here when digging into this problem : https://issues.qgis.org/issues/17872

elpaso commented 3 years ago

@rouault what do you think about:

--- a/src/providers/wfs/qgswfsfeatureiterator.cpp
+++ b/src/providers/wfs/qgswfsfeatureiterator.cpp
@@ -129,8 +129,13 @@ QUrl QgsWFSFeatureDownloaderImpl::buildURL( qint64 startIndex, int maxFeatures,
     }
   }
   if ( mShared->mWFSVersion.startsWith( QLatin1String( "2.0" ) ) )
+  {
     query.addQueryItem( QStringLiteral( "TYPENAMES" ),  typenames );
-  query.addQueryItem( QStringLiteral( "TYPENAME" ),  typenames );
+  }
+  else
+  {
+    query.addQueryItem( QStringLiteral( "TYPENAME" ),  typenames );
+  }
elpaso commented 3 years ago

... and

--- a/src/providers/wfs/qgswfsdescribefeaturetype.cpp
+++ b/src/providers/wfs/qgswfsdescribefeaturetype.cpp
@@ -39,11 +39,13 @@ bool QgsWFSDescribeFeatureType::requestFeatureType( const QString &WFSVersion,
       query.addQueryItem( QStringLiteral( "NAMESPACES" ), namespaceValue );
     }
   }
-
-  query.addQueryItem( QStringLiteral( "TYPENAME" ), typeName );
-  if ( !namespaceValue.isEmpty() )
+  else
   {
-    query.addQueryItem( QStringLiteral( "NAMESPACE" ), namespaceValue );
+    query.addQueryItem( QStringLiteral( "TYPENAME" ), typeName );
+    if ( !namespaceValue.isEmpty() )
+    {
+      query.addQueryItem( QStringLiteral( "NAMESPACE" ), namespaceValue );
+    }
   }
rouault commented 3 years ago

@rouault what do you think about:

It's not unreasonable. That said, you might want to go through history in the WFS provider to retrieve where the current code was introduced. As far as I remember the rationale of the current code is that some WFS 2.0 servers only honour TYPENAME singular (don't remember which one). So we'd break them, but apparently we also break others with the current code... If we wanted to support servers that support only TYPENAME singular, we'd need some user facing option, but that's perhaps overkill

elpaso commented 3 years ago

@rouault yeah, we already have so many user options in WFS that I'd prefer to not add another one, the possible combinations of options make it hard to find the one that works for a particular service.

The history dates back into svn times before your refactoring of the provider. To stay on the safe side, we might add this in 3.18 and wait for a backport to 3.16.

@Fuzl I doubt that QGIS is really mistaken here: a server should accept a request with extra query string items present in the URL, the server should just ignore them, the fact that ESRI takes into account both forms (summing these!) for 2.x is the real mistake here.

rouault commented 3 years ago

The history dates back into svn times before your refactoring of the provider

It is more recent and dates back to 0768dde519542f588c2b05b0d8b8001cb54aee3c

elpaso commented 3 years ago

The history dates back into svn times before your refactoring of the provider

It is more recent and dates back to 0768dde

ok, so I guess that because your observations in the commit still stand I guess this is not a QGIS bug but an ESRI one.

elpaso commented 3 years ago

Anyway, a partial patch is here: https://github.com/elpaso/QGIS/tree/bugfix-gh41087-wfs-20-typnames-esri-bug if we change our mind.

Fuzl commented 3 years ago

@elpaso ouch this hurts and I disagree on closing the bug.

As I wrote, I agree ESRI should discard additionnal duplicate keywords, but it is even stranger that QGIS sends out those duplicate keywords in the first place.

I guess the ambiguity between TYPENAME and TYPENAMES three years ago has been patched in all decent WFS server and am ready to check that.
I only need to find one WFS 2 endpoint for each server : Geoserver, QGIS Server, Mapserver, ArcGIS Server for instance - let me know if we need more.

Not beeing able to consume ArcGIS Server WFS from QGIS will slow down my roadmap to transition our system from ArcGIS to the Open counterparts and I would really like to make it work (over 1000 users).

Additionnally, the risk of breaking anything is (very) low considering how specific the changes are.

Finally as you already have a patch ready, I feel we are very close to the solution !

Would you agree to reopen this while I check the TYPENAME(S) ambiguity is gone and if so let the patch flow to master ?

gioman commented 3 years ago

Not beeing able to consume ArcGIS Server WFS from QGIS will slow down my roadmap to transition our system from ArcGIS to the Open counterparts and I would really like to make it work (over 1000 users).

yours seems to be a large organization, you should consider support the work to implement the changes that are deemed necessary on the QGIS side.

elpaso commented 3 years ago

Not beeing able to consume ArcGIS Server WFS from QGIS will slow down my roadmap to transition our system from ArcGIS to the Open counterparts and I would really like to make it work (over 1000 users).

yours seems to be a large organization, you should consider support the work to implement the changes that are deemed necessary on the QGIS side.

Of course this would be helpful because with more funds we could implement yet another config option (as I said we have too many already but if we want to continue to support broken servers I see no other option), but I'm hesitant to apply the patch ~anyway~ as it is because it would basically revert @rouault 's 0768dde and we cannot continue to jump back and forth on this plural/singular nonsense.

@Fuzl the patch is not finished because it broke all the existing WFS continuous integration tests (that assume both forms in the URL) and it's time consuming to fix them all.

Fuzl commented 3 years ago

@elpaso and @gioman ok fair enough, I have checked the sustaining member program and indeed I think we can help.

I have searched the history of 0768dde :

So its seems to me the TYPENAME(S) problem is gone, and it should not be a "back and forth" decision.

Again I agree that ArcGIS Server could deal better with the request, but WFS 2.0 beeing quite central in modern GIS a cleaner request from QGIS seems to me an improvement.

Please let me know the next steps for funding this bug fix

elpaso commented 3 years ago

@Fuzl AFAIK there were another "broken" servers (see: https://github.com/qgis/QGIS/pull/9849#issuecomment-487211814) and IIRC a russian server mentioned in a git log or a PR comment I can't find it right now.

If all these servers now behave according to the specifications I think we can safely revert to the old behavior (basically apply my patch), I can dedicate some time to fix the tests and we should be good to go (of course if you could help with the test I would be happy to leave them to you).

As far as funding is concerned, QGIS does not accept earmarked funds but we are of course very happy to accept funds for generic QGIS activities, the biggest share of these funds goes to paid bugfixing activities (like what I'm doing right now). To be clear, myself and the other most active core developers regularly do the biggest amount of unpaid bugfixing on a totally voluntary basis, we only have occasional paid bugfixing rounds before the releases to be able to focus on QGIS bugfixing for one week or so.

For this particular case, my concern was that if more work was needed (like adding a user option to NOT add the TYPENAME in case of WFS 2) I would have consumed too much time of the few hours of paid bugfixing I've got to fix this single issue.

The alternative approach would be to hire a core developer (you can find a list of company offering commercial QGIS deveolpment services here: https://www.qgis.org/en/site/forusers/commercial_support.html#qgis-commercial-support) but in this case if there is no need for a user-facing option I think you'd better go for a sustaining membership to QGIS organization.

rouault commented 3 years ago

If all these servers now behave according to the specifications I think we can safely revert to the old behavior (basically apply my patch)

I'd be in favor of this. Let's stick to the specification. It will break a few outdated/non-maintained servers that require TYPENAME singular instead of the TYPENAMES plural, but so be it. People facing that issue could probably be able to workaround it by downgrading to WFS 1.1

Fuzl commented 3 years ago

Thank you @elpaso and @rouault for your answers, help and detailed advice.

The russian server is http://geoportal.samregion.ru/wfs12?request=getcapabilities
the then-flawed URL was : http://geoportal.samregion.ru/wfs12?SERVICE=WFS&REQUEST=DescribeFeatureType&VERSION=2.0.0&TYPENAME=EC_1058_33

Today it returns:

Missing typeNames parameter

While the correct version of the URL with TYPENAMES returns the normal output data
(http://geoportal.samregion.ru/wfs12?SERVICE=WFS&REQUEST=DescribeFeatureType&VERSION=2.0.0&TYPENAMES=EC_1058_33)

So all the involved endpoints now seem ok with TYPENAMES.

I would love to learn the skills to change the tests, and I feel it would be a gentle introduction into real open source coding, but I am afraid the work and effort to fix the test would be very much time consuming. I am an "average" Python dev with testing/CI skills but know no C++... How much time would you think necessary to upgrade my skills and fix the test ? (I know, this has 50% error margin but still helps)

elpaso commented 3 years ago

@Fuzl the hard part is to setup the C++ dev environment to have a build with my patch applied, on Linux it shouldn't be too hard if you follow the instructions https://github.com/elpaso/QGIS/blob/bugfix-gh41087-wfs-20-typnames-esri-bug/INSTALL.md#3-building-on-gnulinux

The tests are in Python though so that part should be easy: https://github.com/elpaso/QGIS/commit/b9cc707df075e2217259cd65ec856e1012a6a33d#diff-6f7bfe33d1d1470b3ceb0be3a6efd55f56e8ec26ada42ed8530437c5cc150514

I started to refactor the tests but there are some still failing, and that's where the clock starts ticking ... (and now I am busy on something else for a while ...)

palmerj commented 3 years ago

We are running production versions of Geoserver and it does not support limiting to just TYPENAMES only in the request ( need to check which versions to be sure). We have a massive catalogue of 1000s of layers and if you only provide TYPENAMES KVP it breaks QGIS. But it works ok if you provide singular and plural version in the request, or just the TYPENAME KVP. i.e

https://data.linz.govt.nz/services;key=APIKEY/wfs?service=WFS&request=DescribeFeatureType&typename=data.linz.govt.nz:layer-50844&typenames=data.linz.govt.nz:layer-50844

(Note you need to paste you API key into this URL to get it to work from https://data.linz.govt.nz/my/api/ after you sign up for free).

I'd be in favor of this. Let's stick to the specification. It will break a few outdated/non-maintained servers that require TYPENAME singular instead of the TYPENAMES plural, but so be it. People facing that issue could probably be able to workaround it by downgrading to WFS 1.1

In regards to the WFS 2 standard, it's look ambiguous to me, as you have noted in the past @rouault. I see TYPENAME is supported in the 9.2.3 KVP Encoding http://docs.opengeospatial.org/is/09-025r2/09-025r2.html#142. See OGC issue here http://ogc.standardstracker.org/show_request.cgi?id=530

Also see the geoserver issue here https://osgeo-org.atlassian.net/browse/GEOS-8417?jql=text%20~%20%22TYPENAMES%20or%20TYPENAME%20WFS%202.0%22. Looks like it's been supported since version 2.13: https://github.com/geoserver/geoserver/commit/025c34b3bdd626af3ecd96392abc2a136312ccf8

palmerj commented 3 years ago

Maybe one option is to still support the TYPENAMES and TYPENAME KVPs for just DescribeFeatureType requests. With our geoserver instances GETFEATURE requests seems ok with just TYPENAMES.

rouault commented 3 years ago

Maybe one option is to still support the TYPENAMES and TYPENAME KVPs for just DescribeFeatureType requests

Yes that sounds like a reasonable compromise (I had already forgotten that the spec was so messy...)

Fuzl commented 3 years ago

I confirm that a double TYPENAME(S) in DescribeFeatureType doesn't break ArcGIS Server. So it seems also a reasonable compromise.

But it is strange that the OGC Issue hasn't moved since June 2018... I guess everyone is already focused on OGC API/WFS3 step !

Fuzl commented 3 years ago

Hello @elpaso and thank you for the MR !

I have tested it and it works well except for a nitpick : accentuated characters are not url-encoded in the TYPENAME(S) values. Which returns a "TypeName has invalid value" error.

Should I file another bug referencing this one or is it better to reopen this one ?

I wish you a nice week-end, Fuzl.

PS : our company now officially support QGIS ! 😉

elpaso commented 3 years ago

Yes please, open a new ticket.