steve8x8 / geotoad

Geocaching query tool written in Ruby
https://buymeacoffee.com/steve8x8
Other
28 stars 8 forks source link

"Emoji" (UTF-16) causes several problems #262

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The following query will find just 1 cache and crashes when using ruby 1.9 (and 
works with ruby 1.8):

geotoad/geotoad.rb -v --distanceMax=0.1km --format=gpx --output=test.gpx "N52 
59.855 E8 55.203"

geotoad is the latest version from the SVN, revision 1293.

Tested with these ruby versions:

ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
ruby 1.9.3p194 (2012-04-20 revision 35410) [arm-linux-eabihf]
ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]

These are the last lines of the debug output (omitting full output as it can be 
reproduced with the query above):

D: Generating comment text for Weyher Sumpf
/...anonymized.../geotoad/lib/output.rb:580:in `gsub!': invalid byte sequence 
in UTF-8 (ArgumentError)
        from /...anonymized.../geotoad/lib/output.rb:580:in `makeText'
        from /...anonymized.../geotoad/lib/output.rb:759:in `block in createTextCommentLogs'
        from /...anonymized.../geotoad/lib/output.rb:754:in `each'
        from /...anonymized.../geotoad/lib/output.rb:754:in `createTextCommentLogs'
        from /...anonymized.../geotoad/lib/output.rb:1179:in `block in generateOutput'
        from /...anonymized.../geotoad/lib/output.rb:1168:in `each'
        from /...anonymized.../geotoad/lib/output.rb:1168:in `generateOutput'
        from /...anonymized.../geotoad/lib/output.rb:414:in `prepare'
        from geotoad/geotoad.rb:855:in `block in saveFile'
        from geotoad/geotoad.rb:796:in `each'
        from geotoad/geotoad.rb:796:in `saveFile'
        from geotoad/geotoad.rb:910:in `<main>'

Original issue reported on code.google.com by magic...@gmail.com on 11 Mar 2013 at 1:14

GoogleCodeExporter commented 9 years ago
Confirmed with Debian Wheezy's ruby (1.9.3p194) although the error looks 
slightly different:

/usr/lib/ruby/1.9.1/cgi/util.rb:76:in `chr': invalid codepoint 0xD83D in UTF-8 
(RangeError)
    from /usr/lib/ruby/1.9.1/cgi/util.rb:76:in `block in unescapeHTML'
    from /usr/lib/ruby/1.9.1/cgi/util.rb:55:in `gsub'
    from /usr/lib/ruby/1.9.1/cgi/util.rb:55:in `unescapeHTML'
    from .../geotoad.trunk/lib/output.rb:522:in `makeText'

(GC2RTZR, Found log by "weserscarl" - the smiley seems to encode as �� in 
the HTML file?!?)

No idea yet how to fix this...

Original comment by Steve8x8 on 11 Mar 2013 at 1:40

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
What causes the problem has been known in Japan for more than a decade - 
so-called "emoji". They went into the last (6.0) Unicode standard, but 
obviously Ruby has been somewhat ignorant...
With the recent Unicode-16 conversion of gc, they are now available for 
everyone, with the side-effects seen.
I have done some tests, and hope that the problem only shows up with the "high 
range" codes (Ø..;
...;) for which a conversion rule exists.
Please apply (patch -p1, as usual) the appended diff, and try to reproduce the 
problem... (I suspect running a query for Japanese caches will come up with 
even more such "emoji" characters - and perhaps some more issues...)
In any case, an error during unescapeHTML will no longer result in a crash, I 
hope.

Um... what is your $LANG setting, BTW?

Original comment by Steve8x8 on 11 Mar 2013 at 3:57

GoogleCodeExporter commented 9 years ago
LANG=en_US.UTF-8

Where's the patch? I thought I had seen it earlier, but now that I was about to 
try it, it's gone.

Original comment by magic...@gmail.com on 11 Mar 2013 at 9:19

GoogleCodeExporter commented 9 years ago
Apologies - it was there but seems to have got lost when GoogleCode decided to 
show me a dysfunctional robot :(
Let's try again...

Original comment by Steve8x8 on 12 Mar 2013 at 8:55

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks, just tried the patch and confirmed it's working on my env.

Original comment by magic...@gmail.com on 12 Mar 2013 at 9:19

GoogleCodeExporter commented 9 years ago
That's good news. (There's a means to find out how many people are using ruby 
1.9 :) Ruby 1.8 will be supported for less than 4 months from now.)

What I'd like to know: how does the GPSr show the symbol? Does it show all 10 
visitor logs? (If yes, what's the brand/firmware version of yours? Got to dig 
out my old *reg*n and check.)
I'm a bit hesitant to tag this issue as "FixAvail"...

Original comment by Steve8x8 on 12 Mar 2013 at 7:28

GoogleCodeExporter commented 9 years ago
My Garmin Oregon 450 doesn't like the GPX file with the Emojis. After removing 
them, it worked though. I'm attaching both files.

Original comment by magic...@gmail.com on 13 Mar 2013 at 10:37

Attachments:

GoogleCodeExporter commented 9 years ago
I somehow expected this. Being a Basic Member only, I'm curious how a 
downloaded GPX file would look like for this cache (with the found log 
containing the weird stuff).
Basically there are three options: remove the Emoji strings, convert them the 
same way as for the text part, or mimic gc's gpx...

Original comment by Steve8x8 on 13 Mar 2013 at 4:04

GoogleCodeExporter commented 9 years ago
Here we go... :-)

Btw. I'll be offline the next couple of days, so don't wonder if there's no 
response for test requests. Doesn't mean I've lost interest. ;-)

Original comment by magic...@gmail.com on 14 Mar 2013 at 1:32

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you for this insight ;)
Obviously GPX files are no longer using HTMLencoding (it'd be interesting to 
know when this started), and the character in question is placed in the text in 
its pure UTF-8 (4-byte) representation.
Unless I learn how to translate UTF-16 surrogates into UTF-8 (not only 
codepoints) it's perhaps cheaper to just throw away the stuff (who needs a 
snowman on a GPSr screen?)
Patch to be applied over the latest (already partially patched) SVN version

Original comment by Steve8x8 on 14 Mar 2013 at 8:12

Attachments:

GoogleCodeExporter commented 9 years ago
Issue 266 has been merged into this issue.

Original comment by Steve8x8 on 18 Apr 2013 at 1:11

GoogleCodeExporter commented 9 years ago
After merging Issue 266, adjust Summary

Original comment by Steve8x8 on 18 Apr 2013 at 1:12

GoogleCodeExporter commented 9 years ago
Issue 264 has been merged into this issue.

Original comment by Steve8x8 on 26 Apr 2013 at 7:13

GoogleCodeExporter commented 9 years ago
Issue 264 has been merged into this issue.

Original comment by Steve8x8 on 8 May 2013 at 11:42

GoogleCodeExporter commented 9 years ago
Merged into both 3.17.7 and 3.16.6, supposed to be fixed

Original comment by Steve8x8 on 8 May 2013 at 11:45

GoogleCodeExporter commented 9 years ago
Issue 266 has been merged into this issue.

Original comment by Steve8x8 on 19 Sep 2013 at 11:02