mapbox / preprocessorcerer

Perform preprocessorcery and pick parts on particularly persnickity uploads
ISC License
12 stars 8 forks source link

Convert KML to GeoJSON #19

Closed rclark closed 8 years ago

rclark commented 9 years ago

Mapnik generally does a better job processing GeoJSON than KML -- most importantly it can do the processing with a far smaller memory footprint.

I'm thinking https://github.com/mapbox/togeojson is the right tool for this job.

Unless we want to go all the way and convert any vector file to GeoJSON -- this would mean kml, gpx, shapefile, topojson, csv. Then I'm thinking it sounds like a job for ogr.

cc @springmeyer @GretaCB

tmcw commented 9 years ago

Fwiw, toGeoJSON focuses heavily on robustness. It handles many variations of KML quite well, with the tradeoff of raw performance and memory efficiency. If we want to handle really big KML (> 10mb) we'll want to use something with a streaming parser / generator, like finishing and using twogeojson.

springmeyer commented 9 years ago

@rclark - I think we should outline the realm of other vector pre-processing we anticipate doing to help inform the best method. For example, once we start converting entire files then we should also consider cleaning geometries at the same time: Enforcing consistent winding order (necessary for upcoming vtile spec) is one simple operation or throwing out self-intersecting polygons is another.

rclark commented 9 years ago

These types of geometry optimizations sound to me like they might be easier to approach after we've converted incoming vector data to some consistent format. Do you think otherwise? Are there avenues for performing the conversions that might take care of these types of issues for us?

springmeyer commented 9 years ago

What I have in the back of my mind: Upcoming Mapnik has support for fixing winding order and so one option would be to use Mapnik to convert any format to geojson. Just passing thing through Mapnik as an intermediary would allow consistent handling of things.

rclark commented 9 years ago

My only concern there is that we do occasionally see very high memory usage when passing some vector files through mapnik. I'm not sure if ogr is any better, but I think this needs to be part of our considerations.

springmeyer commented 9 years ago

Currently Mapnik uses OGR internally to read KML. The high memory usage from previous KML's was due to OGR's XML parsing + the fact that rendering tiles requires parsing the KML multiple times per thread. If we use Mapnik to convert a KML in a pre-processing step then it only needs to be opened once rather than multiple times per thread so the memory usage would be roughly == to ogr2ogr on the command line with the added benefit of geometry sanitization.

So, if you run into a KML that ogr2ogr cannot handle then that would be the point we'd need a streaming KML parser (or a feature request to GDAL to fix its parser to require less memory).

mapsam commented 8 years ago

KML is converted to GeoJSON bundles, detectable by bundle-fairy. This is complete!