Closed e-n-f closed 9 years ago
Really interesting!
It's slow, though, because it's wasting time reparsing and reclipping features in the siblings of the tiles they actually belong in.
How much slower than the bottom-up branch? I'd love to see some numbers, or even better, a breakdown of how much time is spent in each subroutine like clipping or filtering out points — can you do a profile like that easily in C++?
can you do a profile like that easily in C++?
@mourner on the walk home I was mentioning running instruments from the command line. It is iprofiler
that allows this on OS X. But to be able to run two traces and then compare them you need to set up profiling within instruments. More details at:
https://github.com/springmeyer/profiling-guide/blob/master/README.md#instruments-on-os-x
Sorry to be so slow to respond @mourner and @springmeyer. I've been using gprof on Linux for profiing. Here's what top-down tippecanoe looks like, tiling the Los Angeles TIGER streets: https://gist.github.com/ericfischer/cd0a783851dc3e2380e9 28.177 seconds vs 17.517 with a bottom-up index.
@ericfischer very nice, thanks for sharing! You should definitely try my clipping approach.
@ericfischer did the improved clipping reduce times?
@mourner Yes, it's much faster now, although it still takes some extra time to recopy the geometries into the clipped child tiles with each pass. I'm going to go ahead and merge it soon (after making minzoom work again) since it makes some pathological geometries work much better.
@ericfischer awesome! How much does it take on the test case above?
it still takes some extra time to recopy the geometries into the clipped child tiles with each pass
Can you pass trivial accepts as feature pointers? In theory you only need to copy features that are clipped so that the geometry is modified.
@mourner Here's the profile as it stands now: https://gist.github.com/ericfischer/c83b9a7b3b9fc35604f7
The line simplification could take less time either by doing it all up front as you do or by deferring it entirely until after feature coalescing, but the final file size ends up bigger, so I'm still trying to figure out how to avoid that problem.
I thought about using pointers to unaltered geometries as you suggest, and it might well be better for speed. The downsides are that it requires keeping at least one level of serialized geometry live through the entire process instead of always letting the parent level be deleted at the end of its pass, and, if the geometry is bigger than memory, it might slow the process down with thrashing, since running through the geometries becomes random access instead of linear. But I haven't actually tested that yet to see the impact.
@ericfischer I can't seem to deduce the total running time from the profile... Is it 3.79s?
@mourner That's weird, and I don't understand why the cumulative time doesn't reflect the actual running time of 11.9 seconds, including 0.5 seconds of I/O. Maybe I shouldn't trust gprof.
@ericfischer still an awesome improvement! Try the upfront simplification, it made a good difference for geojson-vt.
Merged in 0b84f1315928720e4cb1cff22cd5a99af1e6d682. I'll open a separate ticket to optimize line simplification.
@mourner pointed out that there are a lot of advantages to generating tiles from the top down: generating the z0 tile, splitting it into quarters, and continuing to split into quarters until the maximum zoom is reached.
In particular it avoids having to create a bottom-up index of features, which means also not having to worry about de-duplicating features at lower zooms that got indexed multiple times at a higher zoom, or about avoiding generating wasteful multiple indices for large features like lines that jump to null island and back.
I've been trying this out in https://github.com/mapbox/tippecanoe/tree/topdown
It works in general, and has turned up a couple of stupid bugs that happen when features get entirely clipped away, which I didn't notice before because when you have an index, it's relatively rare to clip a feature down to nothing.
It's slow, though, because it's wasting time reparsing and reclipping features in the siblings of the tiles they actually belong in. @mourner has a more clever clipping algorithm that partitions the parent tile instead of clipping each of its children, so I need to look into that.
Polygon coalescing also comes out looking less even because the dropped features aren't distributed uniformly, although that (and non-uniform dot dropping) could be addressed by a single spatial sort of all the features at the very beginning.
I also need to figure out what to do when you don't actually want the low zooms in the tileset, as in the case of the TIGER tiles where the full data doesn't kick in until z12. This is simple when working from the index, but still requires doing at least the partitioning steps at the low zooms if you start from the top.