tkrajina / gpxpy

gpx-py is a python GPX parser. GPX (GPS eXchange Format) is an XML based file format for GPS tracks.
Apache License 2.0
987 stars 223 forks source link

Broken file raises GPXException, XMLSyntaxError, TypeError #10

Closed johnjohndoe closed 11 years ago

johnjohndoe commented 11 years ago

When I use the parser with a broken GPX file an exception is raised. Here is the code I use. My first assumption was to catch a GPXException.

368  file = open(filepath)
369  try:
370      gpx = gpxpy.parse(file)
371  except gpxpy.gpx.GPXException:
372      print "GPXException for %s." % filepath
373      return 1

However the main reason is somewhere else.

Parsing points in sometrack.gpx
ERROR:root:expected '>', line 3125, column 29
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gpxpy-0.8.7-py2.7.egg/gpxpy/parser.py", line 209, in parse
    self.xml_parser = LXMLParser(self.xml)
  File "/usr/local/lib/python2.7/dist-packages/gpxpy-0.8.7-py2.7.egg/gpxpy/parser.py", line 107, in __init__
    self.dom = mod_etree.XML(self.xml)
  File "lxml.etree.pyx", line 2734, in lxml.etree.XML (src/lxml/lxml.etree.c:54411)
  File "parser.pxi", line 1578, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82748)
  File "parser.pxi", line 1457, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:81546)
  File "parser.pxi", line 965, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:78216)
  File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74472)
  File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75363)
  File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74696)
XMLSyntaxError: expected '>', line 3125, column 29

File "somescript.py", line 370, in extractpoints gpx = gpxpy.parse(file)
File "/usr/local/lib/python2.7/dist-packages/gpxpy-0.8.7-py2.7.egg/gpxpy/__init__.py",
     line 28, in parse raise mod_gpx.GPXException('Error parsing {0}: {1}'
                       .format(xml_or_file[0 : 100], parser.get_error()))
TypeError: 'file' object has no attribute '__getitem__'

Please refer to this Stackoverflow post which already suggests a solution to the problem.

Here is a sample file that produces the syntax error.

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<gpx version="1.1" creator="nl.sogeti.android.gpstracker" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/gpx/1/1/gpx.xsd" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gpx10="http://www.topografix.com/GPX/1/0" xmlns:ogt10="http://gpstracker.android.sogeti.nl/GPX/1/0">
<metadata>
<time>2013-03-29T13:52:22Z</time>
</metadata>
<trk>
<name>Track 2013-03-29 14:52</name>
<trkseg>
<trkpt lat="52.48128721585312" lon="13.452307177252663">
<ele>76.0</ele>
<time>2013-03-29T14:09:34Z</time>
<extensions>
<gpx10:speed>5.1076788902282715</gpx10:speed>
<ogt10:accuracy>10.0</ogt10:accuracy>
<gpx10:course>216.4146728515625</gpx10:course></extensions>
</trkpt>
<trkpt lat="52.48077771999975" lon="13.452037039115906">
<ele>78.0</ele>
<time>2013-03-29T14:09:47Z</time>
<extensions>
<gpx10:speed>4.982673168182373</gpx10:speed>
<ogt10:accuracy>10.0</ogt10:accuracy>
<gpx10:course>131.78286743164063</gpx10:course></extensions>
</trkpt>
<trkpt lat="52.480716389960826" lon="13.45214590285023">
<ele>78.0</ele>
<time>2013-03-29T14:09:49Z</time>
<extensions>
<gpx10:speed>4.982266426086426</gpx10:speed>
<ogt10:accuracy>10.0</ogt10:accuracy>
<gpx10:course>129.60655212402344</gpx10:course></extensions>
</trkpt>
<trkpt lat="52.48065528656295" lon="13.452274408986215">
<ele>78.0</ele>
<time>2013-03-29T14:09:51Z</time
tkrajina commented 11 years ago

Hi @johnjohndoe thanks for your bug submission. I changed and commited a new version. The parse() method will now fail on invalid XML file and the code where your problem was manifesting is deleted. Also, I deleted the is_valid() method (it there is a exception on parse() -- no need to check is_valid() later).

Hope this is OK for you?

johnjohndoe commented 11 years ago

@tkrajina Looks good so far. One thing I would rewrite is how you handle exceptions. I would not handle all errors as generic exceptions but add a specific XMLSyntaxError except block before. This is the error you know of at the point in time.

except XMLSyntaxError as e:
    mod_logging.debug('Error in:\n%s\n-----------\n' % self.xml)
    mod_logging.exception(e)
    raise mod_gpx.GPXException('Error parsing XML: %s' % str(e))
except Exception as e:
    mod_logging.debug('Error in:\n%s\n-----------\n' % self.xml)
    mod_logging.exception(e)
    raise mod_gpx.GPXException('Unknown error: %s' % str(e))

If you can accept a crash on an unknown error you could even omit the generic except block.

tkrajina commented 11 years ago

What about this: https://github.com/tkrajina/gpxpy/commit/8accbce26c06cd9df0197902bb6aa302830c63b7 ?

The idea is to have two kinds of gpxpy exceptions, one for invalid XML files and the other for invalid valid-XML-invalid-GPX files. You can choose if to detect all exceptions with GPXException or XML-specific with GPXXMLSyntaxException. If GPXXMLSyntaxException -- you can retrieve the original (which can be lxml or minidom) exception with e.original_exception.

johnjohndoe commented 11 years ago

@tkrajina I disagree. I find it more complicated to handle multiple GPX*Exceptions as a user of the library. What would be the benefit of yet another wrapper for an exception caused by an internal module not part of the library? All the user needs is the original error message which you wrap into a GPXException. This is good enough imho.

johnjohndoe commented 11 years ago

@tkrajina Please consider this comment by abarnert with regards to exception handling.