steve8x8 / geotoad

Geocaching query tool written in Ruby
https://buymeacoffee.com/steve8x8
Other
28 stars 8 forks source link

Resulting GPX file not well-formed (due to wrong ampersand escape pattern) #239

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Issue a search that includes geocache GC1CNZN:
geotoad.rb --distanceMax=5 --includeDisabled --output='d:/temp/' 'bloomfield, 
nm'

What is the expected output? What do you see instead?
[If possible, include the last ~10--20 lines of verbose output.]

Search finished successfully.
Importing the resulting GPX file into e.g. ExpertGPS will result in error 
message, "The Import command could not be completed.  This XML file contains 
one or more errors." 
Similar result with importing into GSAK.

One log entry of user shushyaz&foxy (do a search for "FOUND CACHE HERES MY 
LIST") causes the fail. If this is removed from the GPX file, import succeeds.

What version of the product are you using? On what operating system?
[Did you check you're using the latest version?]

Did an svn up on 04/09/2012 from the geotoad repo, running GeoToad (CURRENT) 
(i386-mingw32-1.9.3), ruby 1.9.2p180 (2011-02-18) [i386-mingw32], Windows 7.

Please provide any additional information below.

Tried more recent ruby version ruby 1.9.3p125 (2012-02-16) [i386-mingw32], same 
result.

Original issue reported on code.google.com by ch.ri...@googlemail.com on 10 Apr 2012 at 5:42

GoogleCodeExporter commented 9 years ago
Can you paste the relevant log entry? (The GC code might be helpful too, if 
there are not too many visitors.)

To avoid such situations, you may routinely pass the GPX output through 
GPSbabel which usually finds XML issues, too.

Does this happen with GeoToad release 3.14.9? What about ruby 1.8.7?

Original comment by Steve8x8 on 10 Apr 2012 at 1:05

GoogleCodeExporter commented 9 years ago
GC code is GC1CNZN, relevant log is below. Will experiment with other version 
in the next days.

    <groundspeak:log id="1393073474">

      <groundspeak:date>2010-03-13T07:00:00.000Z</groundspeak:date>

      <groundspeak:type>Found it</groundspeak:type>

      <groundspeak:finder id="3280307743">shushyaz&foxy</groundspeak:finder>

      <groundspeak:text encoded="False">FOUND CACHE HERES MY LIST<br />D-1;CC'S COOL CUT OFF CACHE;GC1HNVF;9-10-09<br />D-1.5;WOMBAT WALK;GC1BY3F;8-31-09<br />D-2;RABBIT FLATS WASH;GC1NMAJ;8-13-09<br />D-2.5;CACHING WITH MY UNCLE;GC15F2E;10-3-09<br />D-3;BOLDER BOULDERS;GC1H0YG;9-13-09<br />D-3.5;YUCCA CACHE;GC23HK4;2-27-10<br />D-4;ANOTHER HOODOO;GC1DDMF;8-30-09<br />THANKS FOR THE CHALLENGE</groundspeak:text>

    </groundspeak:log>

Original comment by ch.ri...@googlemail.com on 10 Apr 2012 at 7:27

GoogleCodeExporter commented 9 years ago
Any chance to retrieve the corresponding part of the cache page (cdpf.aspx_*) 
from the file cache directory? Thanks in advance

Original comment by Steve8x8 on 15 Apr 2012 at 6:14

GoogleCodeExporter commented 9 years ago
I have found that the "offending" log entry ist still available among the last 
10, so I created a GPX file for it, and passed it through gpsbabel. No 
complaints, although gpsbabel seems to have a slightly different idea of what's 
allowed:

GeoToad output fragment:
    <groundspeak:log id="1393073474">
      <groundspeak:date>2010-03-13T07:00:00.000Z</groundspeak:date>
      <groundspeak:type>Found it</groundspeak:type>
      <groundspeak:finder id="3280307743">shushyaz&foxy</groundspeak:finder>
      <groundspeak:text encoded="False">FOUND CACHE HERES MY LIST<br />D-1;CC'S COOL CUT OFF CACHE;GC1HNVF;9-10-09<br />D-1.5;WOMB
AT WALK;GC1BY3F;8-31-09<br />D-2;RABBIT FLATS WASH;GC1NMAJ;8-13-09<br 
/>D-2.5;CACHING WITH MY UNCLE;GC15F2E;10-3-09<br />D-3
;BOLDER BOULDERS;GC1H0YG;9-13-09<br />D-3.5;YUCCA CACHE;GC23HK4;2-27-10<br 
/>D-4;ANOTHER HOODOO;GC1DDMF;8-30-09<br />THANKS
FOR THE CHALLENGE</groundspeak:text>
    </groundspeak:log>

gpsbabel's version:
    <groundspeak:log id="1393073474">
      <groundspeak:date>2010-03-13T07:00:00.000Z</groundspeak:date>
      <groundspeak:type>Found it</groundspeak:type>
      <groundspeak:finder id="3280307743">shushyaz&foxy</groundspeak:finder>
      <groundspeak:text encoded="False">FOUND CACHE HERES MY LIST<br />D-1;CC'S COOL CUT OFF CACHE;GC1HNVF;9-10-09<br />D-1.5
;WOMBAT WALK;GC1BY3F;8-31-09<br />D-2;RABBIT FLATS WASH;GC1NMAJ;8-13-09<br 
/>D-2.5;CACHING WITH MY UNCLE;GC15F2E;10-3-09<br /&g
t;D-3;BOLDER BOULDERS;GC1H0YG;9-13-09<br />D-3.5;YUCCA CACHE;GC23HK4;2-27-10<br 
/>D-4;ANOTHER HOODOO;GC1DDMF;8-30-09<br />TH
ANKS FOR THE CHALLENGE</groundspeak:text>
    </groundspeak:log>

Can you try to pass the GPX file through gpsbabel (something along the lines of 
"gpsbabel -i gpx -o gpx -f inputfile.gpx -F outputfile.gpx") before feeding it 
into your software? This might help identify the culprit...

Original comment by Steve8x8 on 15 Apr 2012 at 6:29

GoogleCodeExporter commented 9 years ago
Ok, I grabbed again and tried to pre-convert using GPSBabel. GPSBabel rejected 
the file: 

GPX: XML parse error at line 5351 of 'D:\temp\gt_bloomfield_nm-y5.0.gpx' : not 
well-formed (invalid token).

This is the line causing the error, now related to GC1E4FH:

<groundspeak:text encoded="False">FOUND CACHE HERES MY LIST<br />#0;20 CENTS 
CACHE OF NEW MEXICO;GC13QK8;10-3-09<br />#1;ONE WAY;GC1J83G;7-23-09<br />#2;TWO 
RIVERS;GC1VH7N;7-18-09<br />#3;5EV3N;GC22RX8;1-9-10<br />#4;GOLLUM FOURTH 
RIDDLE;GC1ZBGY;10-18-09<br />#5;5K FOR TUX;GC1QAMV;10-11-09<br 
/>#6;65&330;GC1FCR7;9-10-09<br />#7;ALMOST 37 108;GC1KEJ9;3-27-10<br />#8;805 
BUG ZAPPER;GC1HCEV;9-1-09<br />#9;2009 FOCOGEO SUMMER EVENT;GC1WJ89;8-22-09<br 
/>WILL I THOUGHT I HAD IT BUT I,LL HAVE TO TRY AGAIN. POSTING SO I DONT HAVE TO 
TYPE AGAIN.<br />FINALY GOT IT TODAY<br />THANKZ FOR THE CHALLENGE<br /><br 
/>[This entry was edited by shashyaz&foxy on Saturday, March 27, 2010 at 
10:03:14 PM.]</groundspeak:text>

Removing the ampersand in 65&330 fixed the problem. 

I do have the cdpf file of the cache, let me know if you need that.

Original comment by ch.ri...@googlemail.com on 15 Apr 2012 at 7:11

GoogleCodeExporter commented 9 years ago
"&330;" indeed doesn't make sense to me, but I cannot check where this comes 
from since GC1E4FH seems to be a PMO cache.

GC1FCR7 is called "65&330" - which in itself wouldn't create a problem if the 
user hadn't added a semicolon at the end of it.

Can you try the following patch...

--- o/output.rb 2012-03-28 10:42:58.000000000 +0200
+++ n/output.rb 2012-04-17 15:54:36.000000000 +0200
@@ -488,7 +488,7 @@
     text = CGI.escapeHTML(str)
     # CGI.escapeHTML will try to re-escape previously escaped entities.
     # Fix numerical entities such as Pateniemen l&#228;mp&#246;keskus
-    text.gsub!(/&([\#\d][\d]+;)/, "&\\1")
+    text.gsub!(/&(\#[\d]+;)/, "&\\1")
     # Fix hex entities too
     #text.gsub!(/&#x([0-9a-fA-F]+);/) { "&\##{$1.to_i(16)")
     text.gsub!(/&(\#x[0-9a-fA-F][0-9a-fA-F]+;)/, "&\\1")

... and tell me whether that fixes the issue?

Original comment by Steve8x8 on 17 Apr 2012 at 2:03

GoogleCodeExporter commented 9 years ago
This fixed the problem, "65&330" now correctly translates to "65&330".

Thanks!

Original comment by ch.ri...@googlemail.com on 17 Apr 2012 at 8:35

GoogleCodeExporter commented 9 years ago
Good to know that! (I haven't seen any negative effect of this patch in my own 
searches yet, and don't expect any, so this will go into trunk as well.)

Doing some investigation, I found that this bug had been there for quite some 
time. It somehow sneaked in between 3.10.2 and 3.10.3, and a closer look shows 
that commit r562 (dated 2010-01-29) must have been the culprit.
Surprising that nobody found it before...

Now if someone writes a comment containing "Ĭ" what would happen?

Original comment by Steve8x8 on 18 Apr 2012 at 9:04

GoogleCodeExporter commented 9 years ago
Let's see... :)

Original comment by ch.ri...@googlemail.com on 19 Apr 2012 at 7:50

GoogleCodeExporter commented 9 years ago
Will be in 3.16.0

Original comment by Steve8x8 on 2 May 2012 at 1:22

GoogleCodeExporter commented 9 years ago
3.16.0 has been released

Original comment by Steve8x8 on 25 May 2012 at 12:28

GoogleCodeExporter commented 9 years ago
Pattern matching is still causing problems, this time with encrypted hints.
Try parsing GC2RC17

Original comment by ch.ri...@googlemail.com on 27 May 2012 at 3:54

GoogleCodeExporter commented 9 years ago
Opening http://www.geocaching.com/seek/cache_details.aspx?wp=GC2RC17 in the 
browser, I cannot see any hint at all - but there's the "Decrypt" button, so 
there must have been something, right?
Going to the print page 
http://www.geocaching.com/seek/cdpf.aspx?guid=3ffe9f71-5f4a-4b5c-bbb7-f63bb2f6f1
f2&lc=10, I can also see the "Additional Hints" frame - which contains four 
line breaks only, which would result in a row of bars - which may look strange 
but is all GoeToad can make of it.
If you can see more information: How did you get it? (Shouldn't depend on the 
browser as the HTML is the same...)
I remember that GC has changed handling of HTML stuff inside hints a few weeks 
ago, might this be the reason you don't see the hint anymore (if there was any 
before)?

Temporarily re-opening...

Original comment by Steve8x8 on 29 May 2012 at 8:04

GoogleCodeExporter commented 9 years ago
What's the current state of this problem? No response for 8 weeks

Original comment by Steve8x8 on 23 Jul 2012 at 8:43

GoogleCodeExporter commented 9 years ago
Closing (until another bug of the same type shows up)

Original comment by Steve8x8 on 10 Aug 2012 at 7:23