qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.59k stars 3k forks source link

Add Layer Delimited Text with coordinates in DMS does not work with ' and " #57480

Closed pigreco closed 5 months ago

pigreco commented 5 months ago

What is the bug or the crash?

I have this CSV file with long and lat coordinates expressed in DMS ( 7°26'20.5"E, 45°8'41.3"N ), but by importing the file with Add delimited text layer, nothing is loaded, the writing '107 record' appears incorrectly formatted discarded from sample data'.

Where can I find documentation on how to properly format coordinates in DMS?

the same file is read correctly by cs2cs

Steps to reproduce the issue

  1. Start QGIS 3.34
  2. Add Delimited Text Layer;
  3. upload csv file;
  4. select GMS coordinates
  5. ESPG:4326
  6. reports formatting error

Versions

OSGeo4W 64 b Win 11 PRO

Versione di QGIS 3.34.6-Prizren Revisione codice QGIS 623828f5 Versione Qt 5.15.13 Versione Python 3.12.3 Versione GDAL/OGR 3.8.5 Versione PROJ 9.4.0 Versione database del Registro EPSG v11.004 (2024-02-24) Versione GEOS 3.12.1-CAPI-1.18.1 Versione SQLite 3.45.1 Versione PDAL 2.6.3 Versione client PostgreSQL 16.2 Versione SpatiaLite 5.1.0 Versione QWT 6.2.0 Versione QScintilla2 2.14.1 Versione SO Windows 11 Version 2009

Plugins Python attivi DataPlotly 4.1.0 eurostat_downloader 0.2.1 felt 2.0.1 kmltools 3.1.33 mapswipetool_plugin 1.2 nominatim_locator_filter 0.3.2 quick_map_services 0.19.34 valuetool 3.0.19 db_manager 0.1.20 grassprovider 2.12.99 MetaSearch 0.3.6 processing 2.12.99

Supported QGIS version

New profile

Additional context

CSV files: inputddmmsstastiera.csv

agiudiceandrea commented 5 months ago

The issue is not actually related to the parsing of the DMS coordinates, but to the parsing of the provided CSV file. In fact, it is not possible to import the provided CSV file using "Add Delimited Text Layer" / "Data Source Manager - Delimited Text" with the "File Format"->"CSV (comma separated values)" option set even setting the "No geometry (attribute table only)" option.

It looks like the provided CSV file doesn't strictly adhere to the RFC-4180 https://www.ietf.org/rfc/rfc4180.txt "Common Format and MIME Type for Comma-Separated Values (CSV) Files" due to the presence of a double quote " (U+022 QUOTATION MARK) char inside a field value not enclosed with double quotes.

The RFC-4180 states that:

  1. [...] If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

  2. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.

  3. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote

Thus, e.g. the field value 7°26'20.5"E should be written in a CSV file as "7°26'20.5""E" in order to be correctly imported using the "Add Delimited Text Layer" functionality with the "File Format"->"CSV (comma separated values)" option set. Both MS Excel and LibreOffice Calc write the field value 7°26'20.5"E as "7°26'20.5""E" in a CSV file.

Using the (U+2033 DOUBLE PRIME) char instead of the " (U+022 QUOTATION MARK) inside a field value of a CSV file doesn't require to escape it and to enclose the field value with double quotes.

You can avoid the issue and allow to import such non standard CSV file, e.g. just selecting "File Format"->"Custom delimiters", set the "Comma" as delimiter and remove the "Quote" character.

Anyway, since the OGR provider can import the same file as a table without errors (just dragging and dropping it in QGIS or using "Add Vector Layer"), I think there is room for improving the CSV parser of the delimitedtext provider with the "File Format"->"CSV (comma separated values)" option set in order to parse without errors, when possible, even CSV files that don't strictly conform to RFC-4180.

As a side note, it seems to me that actually the provided CSV file cannot be correctly parsed by cs2cs PROJ utility.

aborruso commented 5 months ago

Hi @agiudiceandrea , you are right, it cannot work with a wrong CSV.

I have created a TSV, with two rows without field name

7°26'20.5"E 45°8'41.3"N
8°12'33.1"E 45°31'31.0"N

And I'm not able to use it in the add layer wizard, values are read as strings

image

I gave the example of TSV, because it is the default one in cs2cs:

cs2cs +proj=latlong +to +proj=latlong -f "%.6f"<inputddmmsstastiera.tsv

And I confirm that it works, and it returns me this

7.439028    45.144806 0.000000
8.209194    45.525278 0.000000

Probably QGIS wizard reads only prime and double prime.

agiudiceandrea commented 5 months ago

@aborruso, you need to specify the "X field" and "Y field" parameters.

https://github.com/qgis/QGIS/assets/16253859/8b207aee-a1ae-4710-8918-204fb5b5e689

aborruso commented 5 months ago

Now I'm not at the PC, but it doesn't seem to work .

I will try again in the next few hours.

Thank you

aborruso commented 5 months ago

@aborruso, you need to specify the "X field" and "Y field" parameters.

You are right, I don't have to write messages on Sunday morning as soon as I wake up

Where can I find documentation on how to properly format coordinates in DMS?

@pigreco using your CSV with these settings

image

you have it in the map view, without changing anything in the input

image

I don't know what could be improved in the documentation. Here you need to pay attention to the CSV parameters, because it is not standard, and then - since these are DMS - set the DMS check. But in some way we already have that.

What integration do you propose?

pigreco commented 5 months ago

Thank you for the quick response and the technical details.

I noticed that the CSV file I shared reads correctly using the Regular Expression Delimiter option by just writing ,.

https://github.com/qgis/QGIS/assets/7631137/0483c2c2-503c-43cc-afef-e108cbbaa455

I verified that the CSV file must use the prime and douple prime character to be read correctly by QGIS ( char(8242) and char(8243)).

the expressions to_dms($X,'x',1) return prime and douple prime

aborruso commented 5 months ago

I verified that the CSV file must use the prime and douple prime character to be read correctly by QGIS ( char(8242) and char(8243)).

the expressions to_dms($X,'x',1) return prime and douple prime

You don't have to use prime and douple prime. Just use a proper input file, or customize the input to specify what it is not standard in it.

pigreco commented 5 months ago

Just use a proper input file, or customize the input to specify what it is not standard in it.

If I hadn't opened this issue, where could I have read the solutions proposed above?

aborruso commented 5 months ago

If I hadn't opened this issue, where could I have read the solutions proposed above?

I, too, often realize something I hadn't noticed, only after comparing others. Here, if you don't realize you have a wrong file, you may think there is something wrong with the software or in the documentation. A space like this, is a useful space for discussion.

DelazJ commented 5 months ago

So, good to close?