systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.44k stars 229 forks source link

Clipping monster polygons #606

Closed cldellow closed 9 months ago

cldellow commented 9 months ago

This implements some ideas to mitigate the impact of clipping monster multipolygons.

In particular, for any multipolygon that is fully clipped out (i.e. we're in its hole) or clips to the full extent of a tile at z6..z8, we'll track the z9 tiles that it would fully cover or be excluded from.

Later, when processing a tile at z7 or higher, we'll check this tracking information to short-circuit doing an expensive clip operation.

Processing the Hudson Bay extract 1:

master:

real    22m27.753s
user    336m50.481s
sys 0m25.314s

this branch:

real    3m46.838s
user    45m33.959s
sys 0m12.161s

I think the general idea here is sound...but I also think I don't know much about geometry or how it interacts with vector tiles, so a critical eye & mind would be extra appreciated.

Eyeballing the resulting tiles, it seems to look OK. This is not very rigorous, though. :) I wonder if there is tooling that will diff an mbtiles or pmtiles file? In theory, the files ought to be equivalent.

cldellow commented 9 months ago

(i.e., we're in its hole)

Ah, I suppose another cause of this (and a more likely one?) is that we're just outside of the polygon, not in one of its holes. I think this is possible because we use a bounding box when deciding which tiles the polygon affects.

I shouldn't be trusted with geometry stuff.

systemed commented 9 months ago

Impressive speed-up!

Big, complex multipolygons are the root of all evil (or at least, most CPU time). Clipping/simplifying them, while still producing a valid geometry out the other end, is one of the hardest things tilemaker has to do.

As you've seen, in #479 (based on #323) we moved to using a spatial index (rtree) for large polygons, which meant we don't have to store references to OutputObjects in every single z14 tile covered by the polygon. Antarctica is pretty much the canonical case for this.

The rtree simply stores the bbox of the large polygon, however. This leads to situations like https://github.com/systemed/tilemaker/pull/323#issuecomment-950381397 where we start pulling in a lot of tiles that aren't actually affected by the geometry.

My instinct (and nothing more than that!) is that this is essentially the same issue as here. So it might make sense to integrate this into the LargeObjects code - i.e. extend the boxRtree structure in tile_data.h to include the bitsets.

cldellow commented 9 months ago

Thanks for the pointer to those tickets!

Yes, if you squint, this PR seems in the vein of what you mention in https://github.com/systemed/tilemaker/pull/479#issuecomment-1481408713:

store a tile bitmap with each rtree (as a vector, though this of course will increase memory consumption)

The r-tree currently answers the question "does this object possibly output to a given tile?"

I guess this would widen the r-tree's capacities to "does this object definitely output to a given tile?" and, even further "when it does output to a given tile, does it cover it completely?"

Doing the pruning earlier is attractive. This also looks like it'd be in code near fillCoveredTiles, which I've been wondering about. I occasionally see it pop up in stack traces when my computer is only using 1 core at 100% -- Antarctica and the Hudson Bay extract both have this problem when reading the biggest of their relations.

I'd want to measure the memory impact -- this PR doesn't yet try to do something intelligent with memory, but it should be straight-forward to turn this PR's maps into a least-recently-used cache so that bitmaps from past z6 tiles get evicted naturally. The large object index, by contrast, is global and long-lived.

I've put this PR as draft while I investigate.

I also noticed two other things:

  1. the PR has a bug - z9/145/160 incorrectly excludes the bay
  2. some rough prototyping makes me think we can get another 70% decrease in writing if we extend this idea to the z10..z13 tiles, too
cldellow commented 9 months ago

Closing in favour of https://github.com/systemed/tilemaker/pull/607