Closed phidrho closed 1 day ago
The second reported issue may be related to the PR https://github.com/qgis/QGIS/pull/38628 by @3nids.
I think the problem with searching is not even Croatian alphabet, I have same with Polish in filtering (GPKG layer)
Example
query = " column_name in ('światłowodowe') " f_request = QgsFeatureRequest().setFilterExpression(query) features = layer.getFeatures(f_request) ids = [feature.id() for feature in features]
Always is zero results, but in that column value "światłowodowe" exist.
This behavior regarding custom plugin, but if I use python console in QGIS, and add above line : # -- coding: utf-8 --
searching work well.
Is it a bug or something else ?
I trying somthing like: query = f" ({query}) COLLATE Polish_CI_AS " - it wont work
Hi @krzysztof-adamski,
yes, it's not only Croatian alphabet, I just used this for example, it's probably all letters outside ASCII symbols table as I noted in title. Developers are aware of that, it's common problem, and when bug gets fixed, it will fix all worldwide letters.
Edit:
This behavior regarding custom plugin, but if I use python console in QGIS, and add above line : # -- coding: utf-8 -- searching work well.
This is probably Windows and Qt-related, when you write your script and save it to disk, it probably defaults to System default encoding (for Croatia it's Windows-1250
, possibly the same for Polish), but when you run your script it's read from disk as UTF-8
encoded, and your letters are misinterpreted - you think that you are sending query like:
SELECT * FROM table WHERE column_name in ('światłowodowe');
but software actually sends it to database as:
SELECT * FROM table WHERE column_name in ('�wiat�owodowe');
and that's the reason why you don't get any results.
When you entered # -- coding: utf-8 --
at the beginning of script IDE probably forced UTF-8
encoding when saving to disk.
This is another bug/feature, I suggest that you propose to add settings option for default encoding inside IDE.
@phidrho I am not super duper in QGIS, sql, python etc, I figure out this problem just like that
def replace_polish_letter(self, column_name):
polish_chars = ["ą", "ó", "ę", "ł", "ś", "ć", "ż", "ź", "ń"]
string = "upper(#COLUMN#)"
prefix = suffix = ""
for polish_char in polish_chars:
prefix += f"replace("
suffix += f", '{polish_char}', '{unidecode(polish_char)}')"
concat = prefix + string + suffix
concat = concat.replace("#COLUMN#", column_name)
query = query.replace(column_name, "#COLUMN#")
query = query.upper()
query = query.replace("#COLUMN#", self.replace_polish_letter(column_name))
I know it is weird, but enough for yet.
@jef-n, @agiudiceandrea, @DelazJ, @lcosicnus
I've found the source of this bug and fix. In my opinion, this should be fixed in deployment (distribution of SQLite with QGIS), not in the source code of QGIS.
This is a pretty severe bug, in my opinion, it affects almost the whole world and it should be backported to LTS because GPGK is a default file format for QGIS. It also affects SQLite and SpatiaLite layers. The problem happens when QGIS calls DB (GPKG in this case) with a query that uses lower
and upper
built-in SQL functions that don't work on non-ASCII letters by default on SQLite. It does not affect only Locator
but everywhere in QGIS where queries are sent to DB with "upper" and "lower" functions SQLite by default - for example in Select Features By Value
and Execute SQL
, see image below:
An example of a query that is sent from Locator
is:
SELECT m."fid", m."test", m."test_expression" FROM "test" m WHERE (lower("test") LIKE lower('%Š%') ESCAPE '\' OR lower("test_expression") LIKE lower('%Š%') ESCAPE '\')
1) to enable ICU at compile time for SQLite distributed with QGIS.
sqlite3.exe
and sqlite3.dll
files and 3 ICU DLL dependencies must be included but other than that there are no other cons that I can see)2) redefine SQLite's upper
and lower
functions after creating DB connection to SQLite
3) to enable the loading of SQLite extensions - so that we can load ICU extension ourselves via query, e.g.
SELECT load_extension('icu');
.
4) the third alternative is to rewrite all calls from QGIS to call a lower
and upper
in C++ before constructing SQL query to DB. (this would fix current issues but the bug may reappear in the future if some developer is unaware of default SQLite limitations, and also this would cause problems with plugins that are not aware of SQLite limitations).
How I tested and got the correct behavior (got the bug fixed on my machine):
1) determine what were compile time options of sqlite3.exe
in OSGeoW Shell:
SELECT pragma compile_options;
outputs:
ATOMIC_INTRINSICS=0
COMPILER=msvc-1929
DEFAULT_AUTOVACUUM
DEFAULT_CACHE_SIZE=-2000
DEFAULT_FILE_FORMAT=4
DEFAULT_JOURNAL_SIZE_LIMIT=-1
DEFAULT_MMAP_SIZE=0
DEFAULT_PAGE_SIZE=4096
DEFAULT_PCACHE_INITSZ=20
DEFAULT_RECURSIVE_TRIGGERS
DEFAULT_SECTOR_SIZE=4096
DEFAULT_SYNCHRONOUS=2
DEFAULT_WAL_AUTOCHECKPOINT=1000
DEFAULT_WAL_SYNCHRONOUS=2
DEFAULT_WORKER_THREADS=0
ENABLE_COLUMN_METADATA
ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5
ENABLE_RTREE
ENABLE_UPDATE_DELETE_LIMIT
MALLOC_SOFT_LIMIT=1024
MAX_ATTACHED=10
MAX_COLUMN=2000
MAX_COMPOUND_SELECT=500
MAX_DEFAULT_PAGE_SIZE=8192
MAX_EXPR_DEPTH=1000
MAX_FUNCTION_ARG=127
MAX_LENGTH=1000000000
MAX_LIKE_PATTERN_LENGTH=50000
MAX_MMAP_SIZE=0x7fff0000
MAX_PAGE_COUNT=1073741823
MAX_PAGE_SIZE=65536
MAX_SQL_LENGTH=1000000000
MAX_TRIGGER_DEPTH=1000
MAX_VARIABLE_NUMBER=32766
MAX_VDBE_OP=250000000
MAX_WORKER_THREADS=8
MUTEX_W32
SYSTEM_MALLOC
TEMP_STORE=1
THREADSAFE=1
2) install MSYS2 and use MINGW64
shell
pacman -Syu # update packages
# restart MINGW64
pacman -Su #upgrade packages
#install packages
pacman -S base-devel mingw-w64-x86_64-toolchain
3) download ICU source code, extract to home/user
folder of msys64
4) compile ICU
cd
cd icu4c-73_2-src/icu/source/
./configure
make
make install
5) download SQLite (amalgamation version) source code, extract to home/user
folder of msys64
6) check if all links (dependencies) can be found that are needed for SQLite compilation
echo $(pkg-config --libs --cflags icu-uc icu-io)
7) compile sqlite.exe
cd
cd sqlite-amalgamation-3420000/
gcc -Os -I. \
-DSQLITE_DEFAULT_AUTOVACUUM \
-DSQLITE_DEFAULT_CACHE_SIZE=-2000 \
-DSQLITE_DEFAULT_FILE_FORMAT=4 \
-DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=-1 \
-DSQLITE_DEFAULT_MMAP_SIZE=0 \
-DSQLITE_DEFAULT_PAGE_SIZE=4096 \
-DSQLITE_DEFAULT_PCACHE_INITSZ=20 \
-DSQLITE_DEFAULT_RECURSIVE_TRIGGERS \
-DSQLITE_DEFAULT_SECTOR_SIZE=4096 \
-DSQLITE_DEFAULT_SYNCHRONOUS=2 \
-DSQLITE_DEFAULT_WAL_AUTOCHECKPOINT=1000 \
-DSQLITE_DEFAULT_WAL_SYNCHRONOUS=2 \
-DSQLITE_DEFAULT_WORKER_THREADS=0 \
-DSQLITE_ENABLE_COLUMN_METADATA \
-DSQLITE_ENABLE_FTS3 \
-DSQLITE_ENABLE_FTS3_PARENTHESIS \
-DSQLITE_ENABLE_FTS4 \
-DSQLITE_ENABLE_FTS5 \
-DSQLITE_ENABLE_RTREE \
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
-DSQLITE_MALLOC_SOFT_LIMIT=1024 \
-DSQLITE_MAX_ATTACHED=10 \
-DSQLITE_MAX_COLUMN=2000 \
-DSQLITE_MAX_COMPOUND_SELECT=500 \
-DSQLITE_MAX_DEFAULT_PAGE_SIZE=8192 \
-DSQLITE_MAX_EXPR_DEPTH=1000 \
-DSQLITE_MAX_FUNCTION_ARG=127 \
-DSQLITE_MAX_LENGTH=1000000000 \
-DSQLITE_MAX_LIKE_PATTERN_LENGTH=50000 \
-DSQLITE_MAX_MMAP_SIZE=0x7fff0000 \
-DSQLITE_MAX_PAGE_COUNT=1073741823 \
-DSQLITE_MAX_PAGE_SIZE=65536 \
-DSQLITE_MAX_SQL_LENGTH=1000000000 \
-DSQLITE_MAX_TRIGGER_DEPTH=1000 \
-DSQLITE_MAX_VARIABLE_NUMBER=32766 \
-DSQLITE_MAX_VDBE_OP=250000000 \
-DSQLITE_MAX_WORKER_THREADS=8 \
-DSQLITE_MUTEX_W32 \
-DSQLITE_SYSTEM_MALLOC \
-DSQLITE_TEMP_STORE=1 \
-DSQLITE_THREADSAFE=1 \
-DSQLITE_ENABLE_ICU \
shell.c sqlite3.c -lm -lreadline -lncurses $(pkg-config --libs --cflags icu-uc icu-io) -o sqlite3
7) compile sqlite.dll
-l -lreadline -lncurses
- it seems that they are not needed in DLL, but I'm not expertgcc -shared -Os -I. \
-DSQLITE_DEFAULT_AUTOVACUUM \
-DSQLITE_DEFAULT_CACHE_SIZE=-2000 \
-DSQLITE_DEFAULT_FILE_FORMAT=4 \
-DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=-1 \
-DSQLITE_DEFAULT_MMAP_SIZE=0 \
-DSQLITE_DEFAULT_PAGE_SIZE=4096 \
-DSQLITE_DEFAULT_PCACHE_INITSZ=20 \
-DSQLITE_DEFAULT_RECURSIVE_TRIGGERS \
-DSQLITE_DEFAULT_SECTOR_SIZE=4096 \
-DSQLITE_DEFAULT_SYNCHRONOUS=2 \
-DSQLITE_DEFAULT_WAL_AUTOCHECKPOINT=1000 \
-DSQLITE_DEFAULT_WAL_SYNCHRONOUS=2 \
-DSQLITE_DEFAULT_WORKER_THREADS=0 \
-DSQLITE_ENABLE_COLUMN_METADATA \
-DSQLITE_ENABLE_FTS3 \
-DSQLITE_ENABLE_FTS3_PARENTHESIS \
-DSQLITE_ENABLE_FTS4 \
-DSQLITE_ENABLE_FTS5 \
-DSQLITE_ENABLE_RTREE \
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
-DSQLITE_MALLOC_SOFT_LIMIT=1024 \
-DSQLITE_MAX_ATTACHED=10 \
-DSQLITE_MAX_COLUMN=2000 \
-DSQLITE_MAX_COMPOUND_SELECT=500 \
-DSQLITE_MAX_DEFAULT_PAGE_SIZE=8192 \
-DSQLITE_MAX_EXPR_DEPTH=1000 \
-DSQLITE_MAX_FUNCTION_ARG=127 \
-DSQLITE_MAX_LENGTH=1000000000 \
-DSQLITE_MAX_LIKE_PATTERN_LENGTH=50000 \
-DSQLITE_MAX_MMAP_SIZE=0x7fff0000 \
-DSQLITE_MAX_PAGE_COUNT=1073741823 \
-DSQLITE_MAX_PAGE_SIZE=65536 \
-DSQLITE_MAX_SQL_LENGTH=1000000000 \
-DSQLITE_MAX_TRIGGER_DEPTH=1000 \
-DSQLITE_MAX_VARIABLE_NUMBER=32766 \
-DSQLITE_MAX_VDBE_OP=250000000 \
-DSQLITE_MAX_WORKER_THREADS=8 \
-DSQLITE_MUTEX_W32 \
-DSQLITE_SYSTEM_MALLOC \
-DSQLITE_TEMP_STORE=1 \
-DSQLITE_THREADSAFE=1 \
-DSQLITE_ENABLE_ICU \
shell.c sqlite3.c $(pkg-config --libs --cflags icu-uc icu-io) -o sqlite3.dll
8) backup old version of sqlite3.exe, DLL and install new one with it's dependencies
# rename old files
mv C:/OSGeo4W/bin/sqlite3.dll C:/OSGeo4W/bin/sqlite3.dll.backup
mv C:/OSGeo4W/bin/sqlite3.exe C:/OSGeo4W/bin/sqlite3.exe.backup
#install new files
cp C:/msys64/home/user/sqlite-amalgamation-3420000/sqlite3.exe C:/OSGeo4W/bin/sqlite3.exe
cp C:/msys64/home/user/sqlite-amalgamation-3420000/sqlite3.dll C:/OSGeo4W/bin/sqlite3.dll
cp C:/msys64/mingw64/bin/ICUIN73.DLL C:/OSGeo4W/bin/ICUIN73.DLL
cp C:/msys64/mingw64/bin/ICUUC73.DLL C:/OSGeo4W/bin/ICUUC73.DLL
cp C:/msys64/mingw64/bin/ICUDT73.DLL C:/OSGeo4W/bin/ICUDT73.DLL
9) run QGIS and test
SELECT upper('ä ö ü ß š đ ž č ć à â æ ç é è ê ë î ï ô œ ù û ü ÿ à è ì ò ù')
SELECT lower('Ä Ö Ü Š Đ Ž Č Ć À Â Æ Ç É È Ê Ë Î Ï Ô Œ Ù Û Ü Ÿ À È Ì Ò Ù')
SELECT upper('ä ö ü ß š đ ž č ć à â æ ç é è ê ë î ï ô œ ù û ü ÿ à è ì ò ù')
SELECT lower('Ä Ö Ü Š Đ Ž Č Ć À Â Æ Ç É È Ê Ë Î Ï Ô Œ Ù Û Ü Ÿ À È Ì Ò Ù')
Locator
Select Features By Value (F3)
and search for letters in you languageEDIT: I'm notifying the rest of QGIS Packaging team: @kalxas @daniviga @sebastic @kyngchaos @PeterPetrik
EDIT2: I removed the "icu-devel" package in step no. 2 from the instructions as it's not needed
FWIW sqlite3 in Debian is no longer built with ICU support, see: #584506 - libsqlite3-0: Include Unicode support.
Related bugreport: #536662 - sqlite3: CLI not UTF-8 clean
Using GDAL compiled from master, it works in QGIS. It works in the command line too.
I have two features in QGIS:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> wkt_geom | fid | name |
---|---|---|
Point (-7.74873431926051559 37.09479248838015764) | 2 | Šimun |
Point (-7.74289089247152074 37.08442040582968957) | 3 | Maša |
Searching in QGIS works.
The following query also works on the command line:
ogr2ogr -sql "SELECT m."fid", m."name" FROM "test" m WHERE lower("name") LIKE lower('%Š%') ESCAPE '\'" -f CSV /vsistdout/ bratislava.gpkg
name
Šimun
Maša
The GDAL query works, not because lower
or upper
works. These functions are not working for diacritics.
The GDAL SQL works because the LIKE
operator does an insensitive case comparison.
This also works, without the lower
function applied to both sides of the comparison:
ogr2ogr -sql "SELECT m."fid", m."name", lower(m."name"), upper(m."name") FROM "test" m WHERE "name" LIKE '%Š%' ESCAPE '\'" -f CSV /vsistdout/ bratislava.gpkg
name,lower(m.name),upper(m.name)
Šimun,Šimun,ŠIMUN
Maša,maša,MAšA
The GDAL SQL works because the
LIKE
operator does an insensitive case comparison.
and because since GDAL 3.9, GDAL overloads the base LIKE operator to support case-insentitive UTF-8 comparisons, but restricted to a few character-set (mostly European ones): https://gdal.org/en/latest/user/sql_sqlite_dialect.html#like-operator
Hi @rouault,
I also tested now on my local machine and can confirm it works. I compiled latest GDAL, and in ogr2ogr works as expected:
GDAL 3.4.1, released 2021/12/27 (Ubuntu 22.04 official repository)
$ ogr2ogr -sql "SELECT m."fid", m."name" FROM "test" m WHERE lower("name") LIKE lower('%Š%') ESCAPE '\'" -f CSV /vsistdout/ /home/phidrho/Desktop/test.gpkg
GPKG: GeoPackage v1.2.0
GDAL: GDALOpen(/home/phidrho/Desktop/test.gpkg, this=0x5bb8bcb461a0) succeeds as GPKG.
GDAL: GDALDriver::Create(CSV,/vsistdout/,0,0,0,Unknown,(nil))
name,
Šimun
GPKG: finalize 0x5bb8bcb4c638
GDALVectorTranslate: 1 features written in layer 'SELECT'
GDAL: GDALClose(/home/phidrho/Desktop/test.gpkg, this=0x5bb8bcb461a0)
GDAL: GDALClose(/vsistdout/, this=0x5bb8bce69d70)
GDAL: In GDALDestroy - unloading GDAL shared library.
GDAL 3.10.0dev-25eb9f6ee7
$ LD_LIBRARY_PATH=. CPL_DEBUG=ON ./apps/ogr2ogr -sql "SELECT m."fid", m."name" FROM "test" m WHERE lower("name") LIKE lower('%Š%') ESCAPE '\'" -f CSV /vsistdout/ /home/phidrho/Desktop/test.gpkg
GPKG: GeoPackage v1.2.0
GDAL: GDALOpen(/home/phidrho/Desktop/test.gpkg, this=0x571e68570ff0) succeeds as GPKG.
GDAL: GDALDriver::Create(CSV,/vsistdout/,0,0,0,Unknown,(nil))
OGR2OGR: Using WriteArrowBatch()
GPKG: finalize 0x571e68570558
name
Maša
Šimun
GDAL: GDALClose(/home/phidrho/Desktop/test.gpkg, this=0x571e68570ff0)
GDAL: GDALClose(/vsistdout/, this=0x571e685b8ce0)
GDAL: In GDALDestroy - unloading GDAL shared library.
I would like to close this bug report, as it is a matter of time for fix to be applied, and it depends on packaging, not anymore on development.
I'm only notifying packaging team: @jef-n, @kyngchaos, @PeterPetrik, @kalxas, @daniviga, @sebastic
What is the bug or the crash?
When I tried to search for attribute inside GPKG layer which contains letters used in Croatian alphabet (š,đ,č,ć,ž) via
Locator
with use ofaf
modulator - I did not get any results if entered different case (uppercase or lowercase) than what is stored, I tried to do the same thing for SHP layer and it works as expected (it's not case sensitive).I also noticed small UI bug - in form for Locator Configuration (Setting>Options>Locator>Configuration) there is a widget (QSpinBox) to set up
Maximum number of results
which accepts values up to 200, but value cannot be above 99.Both bugs are shown in video demo at youtube.
Steps to reproduce the issue
Download a sample project with GPKG and SHP variant of same layer - qgis_locator.zip
Use a
Locator
withaf
prefix with any of Croatian letters - š, đ, č, ć, ž, Š, Đ, Č, Ć, Žfor UI widget bug see end of video noted above
Versions
QGIS version 3.28.0-Firenze QGIS code revision ed3ad0430f3 Qt version 5.15.3 Python version 3.9.5 Compiled against GDAL/OGR 3.5.2 Running against GDAL/OGR 3.5.3 PROJ version 9.1.0 EPSG Registry database version v10.074 (2022-08-01) GEOS version 3.10.3-CAPI-1.16.1 SQLite version 3.39.4 PDAL version 2.4.3 PostgreSQL client version unknown SpatiaLite version 5.0.1 QWT version 6.1.6 QScintilla2 version 2.13.1 OS version Windows 10 Version 2009
Active Python plugins actions_for_relations 1.2.0 document_management_system 0.3.1 EquirectangularViewer 1.4 GroupStats 2.2.6 plugin_reloader 0.9.2 QGIS3-getWKT 1.4 QuickWKT 3.1 SelectByRelationship 0.3.3 db_manager 0.1.20 processing 2.12.99
Supported QGIS version
New profile
Additional context
No response