mapbox / tippecanoe

Build vector tilesets from large collections of GeoJSON features.
BSD 2-Clause "Simplified" License
2.72k stars 432 forks source link

Optimizations for global datasets #314

Closed jdaniel-GIS closed 7 years ago

jdaniel-GIS commented 8 years ago

I have some global shape file datasets I would like to convert. Leaflet stops zooming at zoom level 18. But I can't really process these datasets at any level above 12. I left it running over the weekend and at -z 18 and it was far from done and had a 281G mbtiles file. Obviously that's not going to work.

Am I correct in understanding that it is generating a tile for the entire coverage of my dataset? Which is the whole world in my case? These are just country borders so there isn't that much actual data here. My json file is 694MB, but that is just json. My shape file is only 295 MB.

Would it be possible to mask oceans and interiors and avoid generating tiles for them? If so, I can do this and submit a pull request. I just want to know if people think it is feasible before I dive in.

e-n-f commented 8 years ago

Polygons tend to make huge mbtiles files because the tileset has to refer to the polygon in every tile the polygon touches, even though all it has to include in the tile is a square to signify that the polygon covers that entire tile.

You can certainly avoid generating the interior tiles if you just change the input to claim that the data is a MultiLineString instead of a Polygon. But then you won't be able to render it as a polygon without it being visible that the polygon stops at the edge of the tiles that contain its border rather than continuing through the entire interior.

If you do need polygon interiors, the main thing I can suggest is to take advantage of overzooming: render your tileset to -z12, but in your style, keep styling it all the way down to zoom level 18 or beyond. At the zoom levels beyond where tiles are available, Mapbox tile serving will fall back to scaling up tiles from the lower zooms. Zoom level 12 tiles at 4096 units per tile are good enough for 8-foot resolution on the ground, which should look pretty decent for most purposes.

jdaniel-GIS commented 8 years ago

I think my problem stems from my very specific, and apparently a bit unusual, use case. These vector pipeline tools are designed to render dense data like OSM. I have sparse data, but it is at global coverage for the most part. The vector display architecture is what I need to render it. But I have to optimize it for tile generation and storage efficiency.

I have two specific questions: 1) If I have a global polygon dataset (GAUL/GADM shape files, to be precise), is tippecanoe going to render tiles for ocean coverage, where there are no polygons? 2) Where in the code (since I just found the whole project on Thursday) would I need to look to add a geometric relationship check to decide if I want to generate a tile or not? I need to use a "touches" relationship. Since I have sparse polygon data, all I care about are the borders, not interiors or exteriors. I don't need to generate tiles for either.

Thanks for the tip about the styling. I will forward that on to another group here in my org where it might really help them. In my case, I don't have to worry about interiors at all. I have my own server. If I don't have a requested tile in the mbtiles file, I can just generate and return a valid, but empty, tile on the fly.

I will start digging through the code myself as I have time, but if anyone has any pointers, I would appreciate them.

e-n-f commented 8 years ago

Thanks for clarifying.

  1. Tippecanoe will not generate tiles that are not touched by any features, so there is no need to worry about ocean tiles being generated if you don't provide ocean polygons.
  2. Most feature exclusion happens through continue statements like this one that excludes features that do not overlap the bounding box of the tile being generated. There is another check later on that excludes any features whose geometry was reduced to nothing during later processing. Another check excludes layers that contain no features, and another excludes tiles that contain no layers.

It should be possible to use overzooming even if you are serving your own tiles, but I am not so sure on the details of how to set that up.

jdaniel-GIS commented 8 years ago

Thanks. That's very helpful.

Tippecanoe is using the Clipper library. I will have to kludge something up to get the same result as "touches". I had looked at Clipper before for another project that is currently on hiatus. I decided to use Boost geometry instead. It was a pain to code, but I got GDAL to (mostly) pass its geometry tests using Boost geometry instead of GEOS.

I am not going to try to use Boost geometry for this. But is that something that tippecanoe would be interested in using instead of Clipper? Or does Clipper have much better performance? If you wanted to use Boost geometry, I could do that.

I'm just wondering though. I think I have everything I need at this point. You've been very helpful. The tippecanoe project was just what I needed. I have a lot of odd problems and tippecanoe is the only viable solution for my particular situation.

e-n-f commented 8 years ago

Clipper is on its way out from Tippecanoe as soon as Wagyu is completely stable as a replacement for it. I'm not sure what considerations originally led to Mapbox tools standardizing on Clipper instead of Boost geometry.

Please be aware that the geometry that is being processed at each stage in Tippecanoe has generally already been clipped from its original version. If you need unclipped geometries, you will need to do something like the P_CLIPPING parts of the code that optionally preserve the full original geometry into each tile, even when it is much larger than the tile.

springmeyer commented 8 years ago

I'm not sure what considerations originally led to Mapbox tools standardizing on Clipper instead of Boost geometry.

boost::geometry polygon support is designed to work with "simple" polygons only (this is documented at http://www.boost.org/doc/libs/1_61_0/libs/geometry/doc/html/geometry/reference/concepts/concept_polygon.html#geometry.reference.concepts.concept_polygon.rules). So, if you attempt to pass a polygon with self-intersections to boost::geometry::intersection it will throw an exception. In the Mapbox case we needed to support user input that might contain self-intersections and could not be easily fixed before clipping. With Clipper, and soon with Wagyu, we can clean these self-intersections and clip at once.

e-n-f commented 7 years ago

Closing since I don't think there is an additional action for me to take here.

jdaniel-GIS commented 7 years ago

Sorry, I got busy on lots of other stuff. I have it working well. Would you want a pull request for this? I don't know if this feature is something other people would be interested in. I added an argument to omit interior tiles. I also added a new table for the first omitted tile. The idea is that a clever tile server could query the omission table and return a filled polygon instead of a 404 if a tile, or any parent tile, is in the omission table. However, I haven't actually worked out how to do that last part. My tiles look great, but I do get lots of 404s, which don't really hurt anything. I was holding off on the pull request until I could actually demonstrate how to use that new table.

e-n-f commented 7 years ago

You're right, this really is the right thing to do with big polygons: If the only features in a tile are polygons that cover the entire tile, it should be possible to skip generating the tile at all and let the coverage just be overzoomed from the parent.

I don't think this actually works with Mapbox tile serving: if I create a world tileset with land to z4 and water to z5, and try to load a z5 tile that would contain only water, I get a 404 error instead of an overzoomed tile of z4 water, I guess because it's still within the overall maxzoom of the tileset. But it really should fall back to the parent, either on the client side or the server side, so cases like this can work nicely without having to fill vast areas with identical tiles.