systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.5k stars 233 forks source link

faster Intersects queries #765

Closed cldellow closed 1 month ago

cldellow commented 2 months ago

This improves on previous work to short-circuit Intersects queries in the negative case, so as to avoid the expensive call into Boost's R-Tree. (Introduced in wildly-fast-and-wildly-broken form in PR #635, then fixed in #641.)

For one of my use cases, this decreases the build time from 7m47s to 1m30s, while also decreasing memory use.

There are 3 main changes:

  1. The old code was very naive. When loading shapes from a shapefile or GeoJSON file, it used a bounding box on the shape, and marked every z14 tile contained therein as possibly containing an intersecting shape.

The bounding box approach is a good approximation for many shapes but breaks down for large, irregular shapes that span many z14 tiles, like https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197

Instead, we now use 2 bits for each z14 tile. The first bit indicates whether the tile might contain an intersecting shape. The second bit indicates whether it definitely contains an intersecting shape. The second bit is lazily initialized with the more expensive intersects query only when that tile is queried.

This provides a big speedup, but at the cost of 2x the memory usage.

  1. Previously, the old code indexed at indexZoom, which is clamped to 14. The code now uses spatialIndexZoom, which is set to 15. indexZoom controls other indexes as well, where the marginal decrease in runtime doesn't scale 1:1 with increased memory usage. For this index, though, it seems like the extra 2x memory cost is repaid with a 50% decrease in runtime, which seems worth it.

  2. To mitigate the 4x memory increase (from 32MB to 128MB per indexed layer), the index is now sparse. We previously tracked a bitset for the entire globe. Now, we track the index as a set of bitsets, one for each z6 tile. If no z15 tiles in a given z6 tile need to be indexed, we don't allocate a bitset. If you're doing a small geographical extract, you'll benefit from this. Even if you're doing the globe, you'll probably benefit, since your shapefiles probably don't cover the oceans and Antarctica.

This gives up a bit of the runtime benefit, but decreases the memory usage.

Some future improvements (not urgent, just capturing them before I forget):

  1. Reduce memory: instead of allocating 2 bits per z15 tile, we could do 1 bit per z15 tile to track intersections, and, say, 1 bit per z13 tile to track whether we've done the lazy initialization. Then lazily init all the z15 tiles in the containing z13 tile on first access.

  2. With some bookkeeping, I think we could speed up the special case where there's a positive match of exactly 1 shape, if that 1 shape spans the entire z15 tile. This definitely occurs often for my use case of national parks - probably it occurs for other use cases like political boundaries.

  3. AreaIntersecting and CoveredBy could re-use these indexes to speed up their negative cases. And if (2) is done, they could re-use that work to speed up the positive case in the special case. I don't use these yet, but I might start soon.

systemed commented 1 month ago

Very neat! Thank you.

(I idly wonder if there's any mileage in using this approach elsewhere – the https://github.com/systemed/tilemaker/pull/323#issuecomment-950381397 scenario. But we might have squeezed all the gains out of that already.)

cldellow commented 1 month ago

True, I think this approach could have worked well in lieu of https://github.com/systemed/tilemaker/pull/607. I suspect we've got all the gains there - IIRC, it's no longer showing up in the CPU profiles, whereas before it was really obvious.