qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.43k stars 2.98k forks source link

CSV trim-option does not trim spaces at the end of strings #57959

Open wvdbee opened 3 months ago

wvdbee commented 3 months ago

What is the bug or the crash?

The trim fields-option does not trim a string field.

Steps to reproduce the issue

Settings:

Some fields contain loads of training spaces afbeelding

Settings in Add CSV-Layer-window afbeelding

Alas, spaces have not been trimmed. afbeelding

Demo-set 50120NED_TypedDataSet_25062024_172055.zip

Versions

QGIS version | 3.36.3-Maidenhead | QGIS code revision | 2df96554 -- | -- | -- | -- Qt version | 5.15.13 Python version | 3.12.3 GDAL/OGR version | 3.9.0 PROJ version | 9.4.0 EPSG Registry database version | v11.004 (2024-02-24) GEOS version | 3.12.1-CAPI-1.18.1 SQLite version | 3.45.1 PDAL version | 2.6.3 PostgreSQL client version | 16.2 SpatiaLite version | 5.1.0 QWT version | 6.2.0 QScintilla2 version | 2.14.1 OS version | Windows 11 Version 2009   |   |   |   Active Python plugins AutomaticBackup-master | 1.0 BGTImport | 3.18 changeDataSource | 3.1 create_layer_from_selected_features | 1.2 geo_sim_processing | 1.2.0 GroupStats | 2.2.7 mmqgis | 2021.9.10 pcraster_tools | 0.3.0 pdokservicesplugin | 5.0.1 precisioncursor4qgis-main | 1.1.D processing_saga_nextgen | 1.0.0 qgis_resource_sharing | 1.0.0 quick_map_services | 0.19.34 SelectWithin | 0.4 slyr_community | 5.0.0 StreetView | 3.2 topo_tijdreis | 1.0 db_manager | 0.1.20 MetaSearch | 0.3.6 processing | 2.12.99 QGIS version 3.36.3-Maidenhead QGIS code revision [2df96554](https://github.com/qgis/QGIS/commit/2df96554) Qt version 5.15.13 Python version 3.12.3 GDAL/OGR version 3.9.0 PROJ version 9.4.0 EPSG Registry database version v11.004 (2024-02-24) GEOS version 3.12.1-CAPI-1.18.1 SQLite version 3.45.1 PDAL version 2.6.3 PostgreSQL client version 16.2 SpatiaLite version 5.1.0 QWT version 6.2.0 QScintilla2 version 2.14.1 OS version Windows 11 Version 2009 Active Python plugins AutomaticBackup-master 1.0 BGTImport 3.18 changeDataSource 3.1 create_layer_from_selected_features 1.2 geo_sim_processing 1.2.0 GroupStats 2.2.7 mmqgis 2021.9.10 pcraster_tools 0.3.0 pdokservicesplugin 5.0.1 precisioncursor4qgis-main 1.1.D processing_saga_nextgen 1.0.0 qgis_resource_sharing 1.0.0 quick_map_services 0.19.34 SelectWithin 0.4 slyr_community 5.0.0 StreetView 3.2 topo_tijdreis 1.0 db_manager 0.1.20 MetaSearch 0.3.6 processing 2.12.99 ### Supported QGIS version - [X] I'm running a supported QGIS version according to [the roadmap](https://www.qgis.org/en/site/getinvolved/development/roadmap.html#release-schedule). ### New profile - [X] I tried with a new [QGIS profile](https://docs.qgis.org/latest/en/docs/user_manual/introduction/qgis_configuration.html#working-with-user-profiles) ### Additional context _No response_
pigreco commented 3 months ago

With the attached dataset I confirm the problem also in QGIS 3.34.8, 3.38.0

OSGeo4W win 11

aborruso commented 3 months ago

Hi @wvdbee in some way your CSV is not optimal, because you have double quotes even when you don't need them.

If you have something like this

image

ID;Perioden;Gemeentenaam_1
4683;2022JJ00;Amsterdam                               
4687;2022JJ00;Amsterdam                               
4691;2022JJ00;Amsterdam                               
4695;2022JJ00;Amsterdam                               
4699;2022JJ00;Amsterdam                               
4703;2022JJ00;Amsterdam                               
4707;2022JJ00;Amsterdam                               

the trimming seems to work

csv

It should probably work in your case, too, but maybe if you have the double quotes it's like telling QGIS that those are not "normal" spaces, but that those spaces are part of the values. I don't know how QGIS is set up, though. I wanted to show you this quotation mark thing.

Best regards

wvdbee commented 3 months ago

Hello @aborruso

Thank you for your insight. But I think you're not completely right. Couple of remarks:

RFC4180 says

Which means that aaa,bbb,ccc is not equal to aaa,bbb ,ccc. Second column in the second example contains 4 characters and in the first example it contains 3 characters. So double quotes are not required to add spaces to a field. Spaces and quotes are not explicitly linked to each other.

Second: in fact the file is a DSV and not a CSV (delimiter separated and not comma separated). DSV is free regarding the way it is formatted. Thankfully QGIS supports all kinds of DSV and not specifically RFC4180-complient CSV. DSV-formatted files can contain al sorts of intended and unintended data. And there are no explicit formatting rules, I guess. So you can not do assumptions regarding the meaning of quotes in DSV-files. But even if you apply RFC4180 formatting rules to a DSV-file, then the following examples contain exactly the same content: "aaa";"bbb ";"ccc" and aaa;bbb ;ccc

Third, my main point: I interpret the Trim Fields-check box as a post-processing option: whenever a field contains spaces around, then trim these spaces. Which means, it is up to the user to decide whether or not spaces are intended or not. Back to my example: in RFC4180-complient csv aaa,bbb ,ccc the space is also part of the data. Even then it is up to the user to decide whether or not the space is intentionally or unwanted. The trim-check box then comes in quit usefull to trim unwanted spaces. I think it should work for both the quoted and unquoted fields.

By the way, the example data is is not mine but is a download from our national statistics agency. Spaces are unwanted and not intended. I quess it is a flaw in one of their download formats. And, yes, I know I can preprocess the CSV/DSV myself. But what then is the purpose of the check box? :-)

https://statline.rivm.nl/portal.html?_la=nl&_catalog=RIVM&tableId=50120NED&_theme=93

aborruso commented 3 months ago

Hi @wvdbee

Thank you for your insight. But I think you're not completely right. Couple of remarks:

RFC4180 says

I don't think I'm completely in the right, nor did I refer to rfc4180. And I have nothing against the data from the national statistics agency.

I do not like CSV with unnecessary double quotes, I therefore cleaned it up and saw that it worked better. And I reported it to you.

From here on, a QGIS developer, who is reading this thread, has some useful elements to make a good choice.