Closed GoogleCodeExporter closed 9 years ago
I'm curious how "ü" would look like on a GPSr's display... (OCM seems to
be the only application that has problems with GPX files from GeoToad, which
worries me a bit.)
Although I suspect there's a difference between the description part, and the
log comments. Do umlaut characters in both parts result in OCM crashes?
Original comment by Steve8x8
on 24 Mar 2013 at 7:27
(Another quick Q:) Are you using Ruby 1.8 or 1.9(+)?
Original comment by Steve8x8
on 24 Mar 2013 at 7:29
[deleted comment]
I'm using Ruby 1.8 (Debian Lenny). I've never had problems with geotoad and
ocm. Only since the last weeks. I've tried to replace all ü with ü in my
GPX-Files via Shell-Script but it does not solve the problem for all my
gpx-Files. As far as i can see there are no problems with umlaut characters in
the text (äöü).
Original comment by GCNugget...@gmail.com
on 25 Mar 2013 at 6:40
Yet another remark.
cache.xsd from GroundSpeak (unchanged for quite some time) contains the lines
<!-- html is a boolean. If html=true the enclosed text contains html -->
<xs:attribute name="html" form="unqualified" type="xs:string" />
for both the sort_description and the long_description element.
(Of course, this is not what I'd call "well documented"...)
In the past, they had been set to True in PQ results - this holds for a sample
I received back in 2010. Nowadays, this seems to have changed. (I'm curious
when this happened. COuld this be related to the "Unicode" transition a few
weeks ago?)
So to me this reads "if html=true do not rely on the contents to conform to
XML".
Log entries are a different story. There's no "html" attribute but an "encoded"
one, and while there's no explanation, the line looks familiar:
<xs:attribute name="encoded" form="unqualified" type="xs:string" />
For "historical reasons" (note the lame excuse) this had been set to "False" in
versions as old as 3.7.5 (September 2004, the oldest one I could find). This
might indeed be wrong *if* the context is the same. No real-world device seems
to care.
So in short, GPX files as defined by GroundSpeak allow contents to come in
HTML, and they actually did (at least in the description part). If OCM can only
handle html=False, that's a pity, to say the least.
I'm not convinced GeoToad would convert to html=False in a foreseeable future
(but I'm short-sighted), and certainly not with Ruby 1.8 still supported.
Windows' somewhat peculiar support for UTF-8 adds to the situation.
As I'm rather unhappy with the current situation, I'd like to leave this bug
open to collect more information:
What would GPX files (and corresponding xsd) have looked like in the past 13
years? When did "groundspeak:*_description html" and "groundspeak:text
encoding" actually change? Anyone who has kept old PQs, or individual cache
GPXes?
Original comment by Steve8x8
on 25 Mar 2013 at 7:00
Hm. Citing
http://richesse-gps.googlecode.com/svn/branches/2.0/richesseGPS/files/gpx.cpp:
struct CLogEntry {
SYSTEMTIME Date;
CString Text;
BOOL Encoded; // TRUE if HTML
CString Type; // TODO: use enum
CString Finder;
CLogEntry() {
memset(&Date, 0, sizeof(Date));
Encoded = TRUE;
}
};
This seems to support my theory that HTML is allowed in log entries as well -
although I still have to see some proper documentation if there's any beyond
GS's xsd.
Next upload to my branch will
- use encoded="False" for empty or text-only logs (like the fake "info" one)
- set encoded="True" for "real" HTML logs
if ongoing tests are successful.
I'm afraid e.g. Garmin's parser implementation (in the Oregon x00 series)
doesn't care too much (it has worse problems). Seems to be a tricky business to
validate GS GPX (gpsbabel does a nice job but obviously doesn't catch
everything)...
Original comment by Steve8x8
on 26 Mar 2013 at 10:08
C:GEO seems to also be affected of this issue. It can not import the same
GPX-Files, Error is something like wrong Format for GPX V1.1
Original comment by GCNugget...@gmail.com
on 28 Mar 2013 at 10:00
I'm surprised - I'm a c:geo user myself, and never had issues. Actually I am
currently doing a GPX import as part of a holidays preparation...
Can you reproduce the behaviour with the GPX output of a random single-WID
query? If not, it's probably time to bisect your problematic GPX file, and
isolate the problematic part.
Another observation: Log entries with line feeds instead of <br /> tags will
not be displayed properly on a Garmin x00, independent of the encoded=...
setting. More investigation required.
Original comment by Steve8x8
on 28 Mar 2013 at 5:34
If it helps, i have attached a File which make some troubles.
Original comment by GCNugget...@gmail.com
on 1 Apr 2013 at 8:57
Attachments:
buchholz.gpx: GPX: XML parse error at line 7212 of 'buchholz.gpx' : reference
to invalid character number
<groundspeak:text encoded="False">Heute wieder zur FB nach Hamburg. Mit Claudia die ich gestern mit dem Cachevirus infiziert habe��. Sie hat hier gleich zugegriffen. 2x gecacht und 2x die gleiche Dose.</groundspeak:text>
See Issue 262 - could this be the problem?
What happens if you open the file in a text editor, and remove the "��"
Emoji stuff (which is supposed to be a "grinning face" according to my Unicode
tables)?
Original comment by Steve8x8
on 18 Apr 2013 at 1:20
Nothing changed. C:GEO gives the same error and OCM gives some Errors about
"Referenced character was not allowed in XML.". But i can't find it in gedit.
Maybe OCM is counting the characters in an other way.
Original comment by GCNugget...@gmail.com
on 21 Apr 2013 at 12:15
Okay... gpsbabel only chokes on the first problem with the input file, but
there were two occurrences of "Emoji" characters in the GPX file.
I'm uploading a copy without those. Does it still refuse to be loaded by c:geo,
and by OCM?
Original comment by Steve8x8
on 23 Apr 2013 at 7:03
Attachments:
I have browsed the ocm sources for a place where html="True"/"False" would be
parsed, but didn't find any. Same for text encoding="..."
Although there's only a package available for Ubuntu, I managed to install ocm
1.0.13 on my Debian Wheezy laptop (merely by brute force), and imported
buchholz2.gpx - without any problem.
This leads me to the assumption that "Emoji" has been the culprit here as well,
like in Issue 262 (ans, subsequently, 266).
An automatic replacement of all occurrences of "&#" with "&#" would have masked
UTF-16 surrogates as well (both the ඃ?;
???; ones and their decimal
equivalents, starting with 7???;), sure.
Please try to boil down the problem to a few individual cache IDs, and point me
to them (or send the corrresponding GPXes, again).
gpsbabel has proven to properly detect remaining surrogates, so it's probably a
good idea to check files with gpsbabel first, and iterate over the errors it
flags.
Original comment by Steve8x8
on 23 Apr 2013 at 11:55
Updated geotoad to Version 3.16.6 and there are no problems any more.
Thank You.
Original comment by GCNugget...@gmail.com
on 25 Apr 2013 at 9:11
Assuming that the whole issue was triggered by that UTF-16 stuff that had been
introduced in early March into GS pages, it's probably time to merge this into
Issue 262.
Original comment by Steve8x8
on 26 Apr 2013 at 7:13
I'm the developer of OCM.
"
che.xsd from GroundSpeak (unchanged for quite some time) contains the lines
<!-- html is a boolean. If html=true the enclosed text contains html -->
<xs:attribute name="html" form="unqualified" type="xs:string" />
for both the sort_description and the long_description element."
What the attribute means is that the text value of the attribute should be
interpreted as HTML, not that you can put arbitrary HTML between
<short_description>...</short_description>. You can simply use a <![CDATA]]>
tag to prevent XML parser issues with the actual content, you don't actually
have to double encode things. N.B. groundspeak goes the double encoding route
in their GPX's, OCM uses CDATA in it's export, both are valid XML files
i.e.
<short_description html="false"><![CDATA[Hi <br> How are
you]]></short_description> means a GPX renderer should display the text "Hi
<br> How are you" exactly as is verbatim without turning the <br> into a line
break.
<short_description html="true"><![CDATA[Hi <br> How are you]]></short
description> means you should display the text "Hi [new line] How are you"
<short_description html="true">Hi <br> How are you</short_description> is
invalid, since while you can have a <br> without a closing tag in HTML, it's
not valid in XML.
OCM doesn't bother looking at the flag, because html has always been true
historically, and so it simply takes the contents and renders it to an internal
web browser. Groundspeak doesn't use html logs, but some of the opencaching
sites do in their GPX files. OCM doesn't need to character count, as XML
parsing is built-in to C# and Java.
Original comment by kmcamp...@gmail.com
on 8 May 2013 at 1:34
er...should say "text value of the element", not attribute
Original comment by kmcamp...@gmail.com
on 8 May 2013 at 1:37
Thanks for the lesson, although I'm not sure what I should have learned now?
Apparently, the issue was caused not by improperly reading xsd files, but by
the introduction of Unicode (UTF-16 surrogates), and has vanished since.
Original comment by Steve8x8
on 8 May 2013 at 11:42
I wasn't trying to criticize you, it was just the reasoning why I sent this
user to you and a justification since you seem to imply that I was interpreting
the gpx incorrectly.
The GPS wouldn't render "�" it would have been turned into "�" by the
device after parsing. This is what I meant by double encoding.
You can skip encoding altogether if you use the CDATA marker instead, which I
find is just easier. It wouldn't matter what gc.com does to their HTML, because
it becomes transparent to the parser then.
i.e instead of <short_description><br/>l�</short_description>
you can do <short_description><![CDATA[<br/>&xD83D]]></short_description>, the
parser will treat everything between <![CDATA[ and ]]> as element text.
Original comment by kmcamp...@gmail.com
on 8 May 2013 at 2:03
Anyway, issue solved, so no big deal
Original comment by kmcamp...@gmail.com
on 8 May 2013 at 2:14
Original issue reported on code.google.com by
GCNugget...@gmail.com
on 24 Mar 2013 at 8:48