patjoly / geo-gpx

Perl module to create and parse GPX files
0 stars 1 forks source link

Waypoint name encoding utf-8 does not work #6

Open sebastic opened 2 months ago

sebastic commented 2 months ago

As reported by @flohoff in Debian Bug #1069657:

I was trying to create gpx waypoints with an utf-8 name which does not work:

perl -Mutf8 -MGeo::Gpx -e '$g=Geo::Gpx->new(); $g->waypoints_add({ lat => 0, lon => 0, name => "üöä" }); $g->save(filename => "foo.gpx");'
$ cat foo.gpx
<?xml version="1.0" encoding="utf-8"?>
<gpx xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" creator="Geo::Gpx" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd" xmlns="http://www.topografix.com/GPX/1/0">
<bounds maxlat="0" maxlon="0" minlat="0" minlon="0" />
<wpt lat="0" lon="0">
<name>&#xFC;&#xF6;&#xE4;</name>
</wpt>
</gpx>

The code seems to unconditionally use HTML::Entities->encode_entities

patjoly commented 3 weeks ago

Thanks for your comment.

This is the intended behaviour of the module. Many of the GPS devices using *.gpx files do not support Unicode so invoking HTML::Entities allows for the use of many accented characters in various languages.

When I open foo.gpx with GPS software (e.g. Garmin's Basecamp) and on my device, the waypoint turns up properly.

Of course, a drawback of that design is that searching waypoints based on its name with the module's methods can be trickier -- e.g. waypoint_search() -- but not breaking compatibility with devices is more important.

Capture

flohoff commented 3 weeks ago

The issue is that WHEN you open it with a utf-8 capable client its broken. For example OSMAnd. The XML Envelope advertises utf-8, but contains only US-ASCII + HTML Encoding.

Flo

patjoly commented 3 weeks ago

Can you provide a link to the application or web services you are using so I can try to reproduce.

Thanks.

Le dim. 2 juin 2024 à 11:26, Florian Lohoff @.***> a écrit :

The issue is that WHEN you open it with a utf-8 capable client its broken. For example OSMAnd. The XML Envelope advertises utf-8, but contains only US-ASCII + HTML Encoding.

Flo

— Reply to this email directly, view it on GitHub https://github.com/patjoly/geo-gpx/issues/6#issuecomment-2143899465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHICOWTGEMSPEJVYRTR44QLZFM2SFAVCNFSM6AAAAABGSLTA2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBTHA4TSNBWGU . You are receiving this because you commented.Message ID: @.***>

flohoff commented 3 weeks ago

In the original opening there is a testcase - I switched away from using Geo::Gpx because i could not work around as using HTML::Entities->encode_entities is unconditional - So i created the GPX myself using XML::TreePP

Here is the script i was creating while stumbling on the Bug. https://github.com/flohoff/osm-fixme-to-gpx

I used to open the GPXes with OSM

https://play.google.com/store/apps/details?id=net.osmandAnd

patjoly commented 3 weeks ago

"In the original opening there is a testcase": see my reply to the opening of yesterday.

Where in the OSM interface do you open a *.gpx files? I use OSM but never opened a file with it.

patjoly commented 3 weeks ago

... from https://www.openstreetmap.org/traces I can open a Track from a .gpx but not just waypoints. Importing results in failure when the .gpx file has only waypoints and no tracks -- not an encoding issue, it fails with just plain characters in the name:

<?xml version="1.0" encoding="utf-8"?>
<gpx xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" creator="Geo::Gpx" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd" xmlns="http://www.topografix.com/GPX/1/0">
<bounds maxlat="45.405060" maxlon="-75.139486" minlat="0" minlon="0" />
<wpt lat="45.405060" lon="-75.139486">
<name>OSM_waypoint_test</name>
<time>2020-10-25T20:35:50Z</time>
</wpt>
</gpx>
flohoff commented 3 weeks ago

Not OSM/OpenStreetmap - OSMAnd - ist a Mobile App able to show GPX tracks, or use the GPX Track Waypoint as Markers.

See the link to the Android App Store.

patjoly commented 3 weeks ago

Yes I have it.

Le lun. 3 juin 2024 à 11:12, Florian Lohoff @.***> a écrit :

Not OSM/OpenStreetmap - OSMAnd - ist a Mobile App able to show GPX tracks, or use the GPX Track Waypoint as Markers.

See the link to the Android App Store.

— Reply to this email directly, view it on GitHub https://github.com/patjoly/geo-gpx/issues/6#issuecomment-2145462255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHICOWRNR5R3UTDMJYGLI4LZFSBVZAVCNFSM6AAAAABGSLTA2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBVGQ3DEMRVGU . You are receiving this because you commented.Message ID: @.***>

patjoly commented 3 weeks ago

I just added foo.gpx in OSMAnd, works fine. I will close this issue.

2024-06-03 12 05 40

patjoly commented 2 weeks ago

I am re-opening this issue as I just discovered that the waypoint generated as per the foo.gpx example "works" but by fluke. It does not have the proper unicode characters, they are actually the latin-1 / ASCII character one. Also, the utf8 pragma should not need to be used either.

I am working on a version of the module that would only convert those characters that the HTML::Entities module documents as the "unsafe characters", mainly the ampersand, double-quotes, angle-brackets, etc. An optional setting could be used to encode all entities if the user so wishes, e.g $g->encode_all_entities( 1 ).

I tested a version with characters that don't intersect with the latin-1 set (e.g. greek letters) and the points displayed properly on OsmAnd, need to check with other apps e.g. Garmin's Basecamp, etc.

A new version should be forthcoming soon on CPAN and here, but want to build the tests correctly first as the default behaviour would change.

patjoly commented 3 days ago

Version 1.11 uploaded to CPAN and tagged here should now solve this issue. The module now only encodes the carets, ampersand and double-quotes by defaults. User who prefer other characters can specify them in a new option in save() and xml(), e.g unsafe_char => '<>&"\'öü'