Intersects: faster queries for negative case

cldellow commented 7 months ago

This PR decreases the time for me to build my road layer for Colorado from 471 seconds to 11 seconds.

I'm not sure if this ought to be the new default, or if it should have a config knob, so hoping to use this PR for discussion.

I'm starting to really get my hands dirty with tilemaker, and have come across a performance hotspot with Intersects against a layer with many (relatively) small polygons.

In https://github.com/systemed/tilemaker/discussions/593#discussioncomment-7667947, you mentioned:

The Lua methods like FindIntersecting etc. will need some thought here. They're really useful for region-specific processing so I'd like to keep them, but they do require keeping those particular geometries in memory. Generally there won't be too many of them - rough country polygons etc. - so the impact on memory isn't huge.

I think my use differs from the norm a bit. I maintain two GeoJSON datasets that list national parks (50K items) and city parks (150K items) for North America.

So instead of ~200 country polygons that densely cover the entire planet, I have ~200,000 park polygons that sparsely cover patches of the planet.

When rendering road/path/POIs, I test whether or not they're in a park to write different attributes to my tileset.

I'm not very familiar with the R-Tree data structure. Some Googling makes me think that it's expensive to query it, even in the case of a negative result.

This PR makes negative queries faster for sparse layers. It maintains a bitmap of your z14 (or whatever your basezoom is) tiles, indicating which might possibly intersect with a shape in the layer.

This dramatically speeds up the sparse case, but at some cost:

memory: a z14 layer requires an extra 33MB to store the bitmap
runtime: when ingesting the shapes, we compute the shape's envelope and update the bitmap

For dense layers like countries, this will likely not speed things up at all. In fact, I'd expect a small slowdown. It will also use memory, varying by the amount of your basezoom.

For z14, it's 33MB. Not a big deal, IMO. If your basezoom was z16, though, it'd be 528MB. Probably a big deal, IMO.

That said -- we don't have to index at the basezoom level. We could index at the lower of the basezoom or z14.

Or we could introduce a knob in the layer config, like:

isIndexed: "sparse"

or

isIndexed: "dense"

that tries to capture whether or not it's worth creating the bitmap.

systemed commented 7 months ago

I think my use differs from the norm a bit. I maintain two GeoJSON datasets that list national parks (50K items) and city parks (150K items) for North America.

So instead of ~200 country polygons that densely cover the entire planet, I have ~200,000 park polygons that sparsely cover patches of the planet.

When rendering road/path/POIs, I test whether or not they're in a park to write different attributes to my tileset.

Funnily enough, my use case for cycle.travel is very similar! I have slightly different rendering rules for roads within urban area polygons, so that country roads are prominent but towns aren't too messy. For example, in

Screenshot 2024-01-06 at 17 30 17

the road heading east from Tidmarsh becomes narrow and uncased when it enters the urban area. This is done with a set of 180,000ish urban area polygons (for Europe) or 8,000 (for North America).

For z14, it's 33MB. Not a big deal, IMO. If your basezoom was z16, though, it'd be 528MB. Probably a big deal, IMO. That said -- we don't have to index at the basezoom level. We could index at the lower of the basezoom or z14. Or we could introduce a knob in the layer config, like:

Given that polygon indexing is a bit of a niche feature anyway, so most users won't see the impact, I think we can get away without adding the knob. But defaulting to min(basezoom,14) sounds sensible.

cldellow commented 7 months ago

Ha, yeah, that's exactly why I want it, too. Cities are such boring places, but maps seem to be full of them. :)

I've rebased against master + clamped the index zoom level to the lesser of z14 and the base zoom.

systemed commented 7 months ago

Merged - thank you!

systemed / tilemaker

Intersects: faster queries for negative case #635