qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.09k stars 2.93k forks source link

LIKE and _ do not find special letters like ä, ö, ü and ß (but ILIKE does) #55347

Open fmeier1995 opened 8 months ago

fmeier1995 commented 8 months ago

What is the bug or the crash?

While trying some expressions for selecting features in a data record with the german administrative units I found the following:

When I use the LIKE-operator and search for 'Brde', the unit called "Börde" is not selected. The same happens with ä, ü and ß. Then I use ILIKE instead and now the units are selcted. It also works with % instead of . So it seems only the combination of LIKE and _ do not find these special german letters. (I do not try with special letters from other languages.)

Steps to reproduce the issue

VG250_KRS.zip

  1. Load the shape into a QGIS-project
  2. Open attribute table and "Select features using an expression"
  3. Try these expressions: "GEN" like 'B_rde' (no features found) "GEN" ilike 'B_rde' (1 matching feature; Börde) "GEN" like 'B%rde' (1 matching feature; Börde)

You can also try 'F_rth' (Fürth), 'Gie_en' (Gießen) and 'Ostallg_u' (Ostallgäu).

Versions

QGIS version | 3.34.0-Prizren | QGIS code revision | ffbdd67881 -- | -- | -- | -- Qt version | 5.15.3 Python version | 3.9.5 GDAL/OGR version | 3.7.2 PROJ version | 9.3.0 EPSG Registry database version | v10.094 (2023-08-08) GEOS version | 3.12.0-CAPI-1.18.0 SQLite version | 3.41.1 PDAL version | 2.5.5 PostgreSQL client version | 15.2 SpatiaLite version | 5.1.0 QWT version | 6.1.6 QScintilla2 version | 2.13.4 OS version | Windows 10 Version 2009   |   |   |   Active Python plugins db_manager | 0.1.20 grassprovider | 2.12.99 MetaSearch | 0.3.6 processing | 2.12.99 QGIS version 3.34.0-Prizren QGIS code revision [ffbdd67881](https://github.com/qgis/QGIS/commit/ffbdd67881) Qt version 5.15.3 Python version 3.9.5 GDAL/OGR version 3.7.2 PROJ version 9.3.0 EPSG Registry database version v10.094 (2023-08-08) GEOS version 3.12.0-CAPI-1.18.0 SQLite version 3.41.1 PDAL version 2.5.5 PostgreSQL client version 15.2 SpatiaLite version 5.1.0 QWT version 6.1.6 QScintilla2 version 2.13.4 OS version Windows 10 Version 2009 Active Python plugins db_manager 0.1.20 grassprovider 2.12.99 MetaSearch 0.3.6 processing 2.12.99 ### Supported QGIS version - [X] I'm running a supported QGIS version according to [the roadmap](https://www.qgis.org/en/site/getinvolved/development/roadmap.html#release-schedule). ### New profile - [X] I tried with a new [QGIS profile](https://docs.qgis.org/latest/en/docs/user_manual/introduction/qgis_configuration.html#working-with-user-profiles) ### Additional context It isn't a new issue, I also tried in some older versions of QGIS with the same result.
agiudiceandrea commented 8 months ago

@fmeier1995, thanks for reporting. It seems to me the issue occurs using the "Select Features by Expression" map tool with the provided ESRI Shapefile layer, while it doesn't occur with a GeoPackage layer exported from the provided ESRI Shapefile layer or with a Memory layer created from the provided ESRI Shapefile layer. The issue also occurs with e.g. FlatGeobuf or GeoJSON or ESRI File Geodatabase formats. It looks like the issue doesn't occur using the "Field Calculator" (e.g. creating a real or a virtual field). The issue also occurs using the "Select by expression" and "Extract by expression" processing algorithm or setting a "Provider Feature Filter" (via Layer Properties -> Source -> Provider Feature Filter -> Query Builder). Maybe @rouault know if this is an issue in GDAL/OGR library or in the QGIS OGR provider.

rouault commented 8 months ago

Maybe @rouault know if this is an issue in GDAL/OGR library or in the QGIS OGR provider.

The OGR SQL dialect only works properly with ASCII characters. When using _, it just assumes that a character = a byte, which isn't the case for accentuated UTF-8 characters.

fmeier1995 commented 7 months ago

@rouault Thanks for explaining. But I don't get why ILIKE and _ find these characters? And is there a chance to fix this issue? If not maybe there should be an hint in the description of LIKE.

rouault commented 7 months ago

But I don't get why ILIKE and _ find these characters?

As OGR SQL doesn't support ILIKE, QGIS fallbacks to evaluating it on its side rather than OGR side.

And is there a chance to fix this issue?

That might be doable. I've filed https://github.com/OSGeo/gdal/issues/8835 about that.

agiudiceandrea commented 7 months ago

Since this issue may lead to incorrect results in some circumstances without the user being aware of, I think it would be useful to make QGIS fallback to evaluating on its side also the LIKE operator at least in the circumstances in which it may return incorrect results.