Closed rclark closed 8 years ago
Fwiw, toGeoJSON focuses heavily on robustness. It handles many variations of KML quite well, with the tradeoff of raw performance and memory efficiency. If we want to handle really big KML (> 10mb) we'll want to use something with a streaming parser / generator, like finishing and using twogeojson.
@rclark - I think we should outline the realm of other vector pre-processing we anticipate doing to help inform the best method. For example, once we start converting entire files then we should also consider cleaning geometries at the same time: Enforcing consistent winding order (necessary for upcoming vtile spec) is one simple operation or throwing out self-intersecting polygons is another.
These types of geometry optimizations sound to me like they might be easier to approach after we've converted incoming vector data to some consistent format. Do you think otherwise? Are there avenues for performing the conversions that might take care of these types of issues for us?
What I have in the back of my mind: Upcoming Mapnik has support for fixing winding order and so one option would be to use Mapnik to convert any format to geojson. Just passing thing through Mapnik as an intermediary would allow consistent handling of things.
My only concern there is that we do occasionally see very high memory usage when passing some vector files through mapnik. I'm not sure if ogr is any better, but I think this needs to be part of our considerations.
Currently Mapnik uses OGR internally to read KML. The high memory usage from previous KML's was due to OGR's XML parsing + the fact that rendering tiles requires parsing the KML multiple times per thread. If we use Mapnik to convert a KML in a pre-processing step then it only needs to be opened once rather than multiple times per thread so the memory usage would be roughly == to ogr2ogr
on the command line with the added benefit of geometry sanitization.
So, if you run into a KML that ogr2ogr cannot handle then that would be the point we'd need a streaming KML parser (or a feature request to GDAL to fix its parser to require less memory).
KML is converted to GeoJSON bundles, detectable by bundle-fairy. This is complete!
Mapnik generally does a better job processing GeoJSON than KML -- most importantly it can do the processing with a far smaller memory footprint.
I'm thinking https://github.com/mapbox/togeojson is the right tool for this job.
Unless we want to go all the way and convert any vector file to GeoJSON -- this would mean kml, gpx, shapefile, topojson, csv. Then I'm thinking it sounds like a job for ogr.
cc @springmeyer @GretaCB