mapbox / tippecanoe

Build vector tilesets from large collections of GeoJSON features.
BSD 2-Clause "Simplified" License
2.72k stars 432 forks source link

Tiling from the top down #12

Closed e-n-f closed 9 years ago

e-n-f commented 9 years ago

@mourner pointed out that there are a lot of advantages to generating tiles from the top down: generating the z0 tile, splitting it into quarters, and continuing to split into quarters until the maximum zoom is reached.

In particular it avoids having to create a bottom-up index of features, which means also not having to worry about de-duplicating features at lower zooms that got indexed multiple times at a higher zoom, or about avoiding generating wasteful multiple indices for large features like lines that jump to null island and back.

I've been trying this out in https://github.com/mapbox/tippecanoe/tree/topdown

It works in general, and has turned up a couple of stupid bugs that happen when features get entirely clipped away, which I didn't notice before because when you have an index, it's relatively rare to clip a feature down to nothing.

It's slow, though, because it's wasting time reparsing and reclipping features in the siblings of the tiles they actually belong in. @mourner has a more clever clipping algorithm that partitions the parent tile instead of clipping each of its children, so I need to look into that.

Polygon coalescing also comes out looking less even because the dropped features aren't distributed uniformly, although that (and non-uniform dot dropping) could be addressed by a single spatial sort of all the features at the very beginning.

I also need to figure out what to do when you don't actually want the low zooms in the tileset, as in the case of the TIGER tiles where the full data doesn't kick in until z12. This is simple when working from the index, but still requires doing at least the partitioning steps at the low zooms if you start from the top.

mourner commented 9 years ago

Really interesting!

It's slow, though, because it's wasting time reparsing and reclipping features in the siblings of the tiles they actually belong in.

How much slower than the bottom-up branch? I'd love to see some numbers, or even better, a breakdown of how much time is spent in each subroutine like clipping or filtering out points — can you do a profile like that easily in C++?

springmeyer commented 9 years ago

can you do a profile like that easily in C++?

@mourner on the walk home I was mentioning running instruments from the command line. It is iprofiler that allows this on OS X. But to be able to run two traces and then compare them you need to set up profiling within instruments. More details at: https://github.com/springmeyer/profiling-guide/blob/master/README.md#instruments-on-os-x

e-n-f commented 9 years ago

Sorry to be so slow to respond @mourner and @springmeyer. I've been using gprof on Linux for profiing. Here's what top-down tippecanoe looks like, tiling the Los Angeles TIGER streets: https://gist.github.com/ericfischer/cd0a783851dc3e2380e9 28.177 seconds vs 17.517 with a bottom-up index.

mourner commented 9 years ago

@ericfischer very nice, thanks for sharing! You should definitely try my clipping approach.

mourner commented 9 years ago

@ericfischer did the improved clipping reduce times?

e-n-f commented 9 years ago

@mourner Yes, it's much faster now, although it still takes some extra time to recopy the geometries into the clipped child tiles with each pass. I'm going to go ahead and merge it soon (after making minzoom work again) since it makes some pathological geometries work much better.

mourner commented 9 years ago

@ericfischer awesome! How much does it take on the test case above?

it still takes some extra time to recopy the geometries into the clipped child tiles with each pass

Can you pass trivial accepts as feature pointers? In theory you only need to copy features that are clipped so that the geometry is modified.

e-n-f commented 9 years ago

@mourner Here's the profile as it stands now: https://gist.github.com/ericfischer/c83b9a7b3b9fc35604f7

The line simplification could take less time either by doing it all up front as you do or by deferring it entirely until after feature coalescing, but the final file size ends up bigger, so I'm still trying to figure out how to avoid that problem.

I thought about using pointers to unaltered geometries as you suggest, and it might well be better for speed. The downsides are that it requires keeping at least one level of serialized geometry live through the entire process instead of always letting the parent level be deleted at the end of its pass, and, if the geometry is bigger than memory, it might slow the process down with thrashing, since running through the geometries becomes random access instead of linear. But I haven't actually tested that yet to see the impact.

mourner commented 9 years ago

@ericfischer I can't seem to deduce the total running time from the profile... Is it 3.79s?

e-n-f commented 9 years ago

@mourner That's weird, and I don't understand why the cumulative time doesn't reflect the actual running time of 11.9 seconds, including 0.5 seconds of I/O. Maybe I shouldn't trust gprof.

mourner commented 9 years ago

@ericfischer still an awesome improvement! Try the upfront simplification, it made a good difference for geojson-vt.

e-n-f commented 9 years ago

Merged in 0b84f1315928720e4cb1cff22cd5a99af1e6d682. I'll open a separate ticket to optimize line simplification.