tilezen / vector-datasource

Tilezen vector tile service - OpenStreetMap data in several formats
https://www.nextzen.org/
Other
507 stars 119 forks source link

Invalid geom in some tiles (GeoJSON) #698

Closed nvkelso closed 8 years ago

nvkelso commented 8 years ago

We've seen this a few times, and while Tangram mostly handles it okay, other tools barf on invalid geometries.

Seems like the data is being transformed during vector tile creation (I suspect it's fine in PostGIS) and that transform results in self-intersecting geoms.

@wboykinm reported this on Twitter last week. He's been testing Mapzen vector tiles with Turf.js and found recurring examples of geometry errors for many metro areas. Superficially, he suspects feature simplification is causing some of this.

Here's an example:

More about the app:

Bill is trying to erase US census geometry with the Mapzen water geom using turf.js, see https://github.com/wboykinm/tribes/blob/mapzen/processing/water/piranha.js.

PostGIS has ST_Makevalid() but Turf.js doesn't have a similar function so simple errors like self-intersecting polygons and non-noded intersections throw big unrecoverable errors that result in data loss.

Looks like Mapnik does some OGC validation in their tiles:

To find more invalid examples, Bill built a tool that tosses out self-intersecting polygons: https://www.npmjs.com/package/turf-bathwater. it'll also log the features that get thrown out.

rmarianski commented 8 years ago

It turns out that this is due to precision loss when we write the json out. @zerebubuth: what do you think we should do here?

wboykinm commented 8 years ago

@rmarianski Are you guys writing to a standardized decimal precision or is it zoom-dependent?

@zerebubuth I see you folks have already been down this road.

zerebubuth commented 8 years ago

The precision depends on the format and the zoom level. For GeoJSON (.json tiles), we change the number of digits of precision by zoom level. For TopoJSON, MVT and OpenScienceMap, the coordinates within the tile are integers, so are inherently bounded by zoom level.

This is, indeed, a well travelled path, and I'm not entirely sure what the best option might be - or whether one exists.

If we want only valid features in the tiles (which seems like a sensible thing to want), then we must do the validity checks after any encoding steps which would simplify or truncate precision (e.g: rounding to integers, truncating decimal places, dropping points due to movement tolerances). However, these steps are different for each output format, so we might end up dropping different features for each format, which could be confusing. Alternatively, we could encode all features in all formats and only output those features which are present in all formats - but this seems like a great deal of effort.

nvkelso commented 8 years ago

Related: http://www.angusj.com/delphi/clipper.php

wboykinm commented 8 years ago

missioncreep

rmarianski commented 8 years ago

Based on a previous discussion, it sounds like we are willing to accept that different formats can contain differences in terms of content.

In this particular case of geojson encoding though, do we want to:

  1. encode with truncated precisions based off zoom, make a decoding pass, and re-encode the invalid geometries with full precision?
  2. encode with truncated precisions based off zoom, make a decoding pass, and drop the invalid geometries?
  3. encode with truncated precisions based off zoom, try a decoding pass, and re-encode invalid geometries with buffer(0)?
  4. always encode all features with full precision? (I'm assuming this option will never generate invalid features. But can it still with default json encoding?)
wboykinm commented 8 years ago

@rmarianski In hacking away at these geoms, I haven't actually had any luck with buffer(0) on the client side (turf.buffer, and the topology remains invalid). Do you mean to try that with PostGIS? And if so, is ST_Makevalid() too slow?

More broadly, I would encourage you folks to consider whether you even want to support a javascript, heavy-geoprocessing use case with your vector tiles. I'm within a few thrown errors of giving up on turf and moving my processing to PostGIS anyway, and you may not encounter this line of inquiry again for months/years.

wboykinm commented 8 years ago

@rmarianski UPDATE: I caved and tried using PostGIS to fix topology on this particularly problematic water polygon from this Mapzen VT, but neither ST_Makevalid() nor ST_Buffer(0) (or 100m for that matter) resolved the feature into a usable geometry. Workflow:

CREATE TABLE valid_water AS (SELECT ST_Buffer(wkb_geometry,0) AS the_geom FROM osm_water);
CREATE TABLE valid_water AS (SELECT ST_Buffer(wkb_geometry,0.001) AS the_geom FROM osm_water);
CREATE TABLE valid_water AS (SELECT ST_Makevalid(wkb_geometry) AS the_geom FROM osm_water);
wboykinm commented 8 years ago

@nvkelso Does the merged PR above address the original problem? I can try running turf over it again if it'd be helpful.

rmarianski commented 8 years ago

It should address it yes. Running turf on it again would be helpful to know if it actually helped.

nvkelso commented 8 years ago

To set expectation, the fix will probably address many geom validity problems (and hopefully yours!) but not all geom problems. That is a Hard Problem (tm).

On Aug 5, 2016, at 14:02, Robert Marianski notifications@github.com wrote:

It should address it yes. Running turf on it again would be helpful to know if it actually helped.

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nvkelso commented 8 years ago

@rmarianski Anything more to test with this one before taking it to prod?

rmarianski commented 8 years ago

Should be good to go here.