tkrajina / gpxpy

gpx-py is a python GPX parser. GPX (GPS eXchange Format) is an XML based file format for GPS tracks.
Apache License 2.0
1.02k stars 223 forks source link

Memory usage when parsing large files #42

Closed limbera closed 9 years ago

limbera commented 9 years ago

I am using gpxpy to parse a 10mb gpx file. The creation of the gpx object uses nearly 1.5gb of main memory. Which I think is quite a lot!

tkrajina commented 9 years ago

I absolutely agree, but there is not much I can do. Try to parse this same file without gpxpy and you will find that big part of that memory is used by the XML parser. The overhead of gpxpy objects doesn't help, of course... But big files aren't easy to fix.

The best solution is probably not to load the entire DOM in memory, but read tag-by-tag and then fill the object model, but at this moment I don't have time for this. If you want to try -- any help is welcome.

What I did in an application where speed/CPU and memory was important is to just plain rewrite everything in Go: https://github.com/tkrajina/gpxgo .

limbera commented 9 years ago

Yep. I ended up writing a much simpler parser using xml.etree (iterparse). I might try and adapt some parts of your code to deal with it this way!

Cheers.