systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.44k stars 229 forks source link

Remove output object ref #595

Closed cldellow closed 9 months ago

cldellow commented 9 months ago

Today, we:

This PR moves us to:

It also restores support for the include_ids flag. When this flag is present, we'll pay the extra cost to track the OSM ID. When absent, we won't.

Runtime for GB seems mostly the same for me, perhaps a few seconds faster.

There is some modest memory savings, about 200 MB.

systemed commented 9 months ago

This looks really good. I'll try and do some proper testing in the next few days (and try to get my head round the source!) but it looks really promising.

systemed commented 9 months ago

I've not looked into it in huge depth but there might be a missing geometry issue. Current master, Great Britain extract, around the Bristol Channel:

Screenshot 2023-11-29 at 21 33 16

This PR:

Screenshot 2023-11-29 at 21 33 09

Note that the M4 and M5 motorways have gone partly AWOL in the second screenshot. It seems to be particularly endemic to motorways, for some reason. I'll do a bit more investigation but thought I'd flag it up first.

cldellow commented 9 months ago

Eep, thanks, good spot.

Yes, I've broken something pretty severely. master for z9/251/170 has 190 transportation features, but this PR has 360. The overall tile size is about the same, so I've probably broken sorting and/or merging -- will look into it.

Separately, this PR is also segfaulting on data that previously worked. Antarctica has https://www.openstreetmap.org/node/11109077988, which has a latp of -1801029268 and a lon of 1762166670. That gives a z14 tile index of x=16211, y=16388. y=16388 is not a valid coordinate for a z14 tile.

In the old code, we would have stashed this node into the tile index at its non-existent coordinate, and then ignored it forever. The new code will instead do an out of bounds write and segfault.

Any thoughts on the right resolution here? We could drop it (as the old code did), or just clamp to the maximum valid coordinate.

cldellow commented 9 months ago

Trying out different commits and measuring how many features end up in the transportation layer for z9/251/170:

The latest changes from master included https://github.com/systemed/tilemaker/commit/fc927be300147c04863fca9a50c2cd120c4fe5a9, which affected the attributes written for the transportation layer.

From this, I conclude PR #595 is probably not the culprit - it didn't change the number of features in the tile.

I'm thinking that PR #583 must be to blame -- that it introduced some latent bug which was triggered by the changes in https://github.com/systemed/tilemaker/commit/fc927be300147c04863fca9a50c2cd120c4fe5a9.

https://gist.github.com/cldellow/21568363603aaaffd4aea30c035fb560 is a log of master's CheckNextObjectAndMerge function as it evaluates the OSM ID for the M5.

You can see that it runs twice - presumably, one is for the transportation layer and one is for the transportation_name layer. Both times it merges a bunch of things together.

In the remove-outputobject-ref branch, it fails to merge the first time (presumably the transportation layer), but the second merge has the same output.

I think that is further evidence in support of the attribute change being suspect. Will do some more poking around.

cldellow commented 9 months ago

Yeah, the attribute store change had a bug--pushed a fix.

I also pushed a fix to maintain the old behaviour for invalid Antarctica nodes.

systemed commented 9 months ago

Running this on Europe (with --store) seems dramatically faster - 2hr50 vs earlier 4hr30 or so. Memory consumption is similar but down a bit. I'll do a bit more testing but it's looking good!

Ok if I merge this into refactor_geometries assuming I don't find any show-stoppers?

cldellow commented 9 months ago

Running this on Europe (with --store) seems dramatically faster - 2hr50 vs earlier 4hr30 or so.

Oh, interesting. I expected the sorted vector to be faster than the old TileIndex approach for the get all tiles with objects at zoom z and get all output objects for the tile at z, x, y operations, but that's quite something. I wonder why it didn't show up as much for GB.

This PR does move the minZoom pruning earlier. Now that happens as part of get all output objects for the tile at z, x, y, whereas before it happened in tile_worker, so we'd be paying the cost to copy the output objects around, resizing and sorting vectors. Perhaps Europe has more things that get pruned due to zoom than GB.

Ok if I merge this into refactor_geometries assuming I don't find any show-stoppers?

Yup.

systemed commented 9 months ago

I've merged this into refactor_geometries and also brought in the latest fixes from master.

Running it over the whole planet (-60°N to 75°N) with --store, it took 15h27 with around 56GB memory usage. I haven't run a planet benchmark for a while but this is a massive improvement on the 2021 figures - execution time and RAM usage both better than halved.

There's a couple of little logging tweaks I'd like to make and the Athens-area geometry is still proving troublesome 😠 but otherwise I think this can all go into master asap.

cldellow commented 9 months ago

That's fantastic! Thanks for verifying the planet works + getting a new baseline on the resources needed.

systemed commented 9 months ago

Running this on Europe (with --store) seems dramatically faster - 2hr50 vs earlier 4hr30 or so.

I think I figured why - the reduced memory usage is enough for the whole dataset to fit in memory (on my 144GB machine). So the speed-up probably is because, even with --store, it doesn't actually need to access the disk much.

systemed commented 9 months ago

Running it over the whole planet (-60°N to 75°N) with --store, it took 15h27 with around 56GB memory usage. I haven't run a planet benchmark for a while but this is a massive improvement on the 2021 figures - execution time and RAM usage both better than halved.

Halved again :)

Execution time, at least - I didn't monitor RAM. But with master+#608, wall clock is down to 7h48.

cldellow commented 9 months ago

Some more timings, on a Hetzner CCX63 (modern EPYC Milan CPU, 48 cores, 192GB, 960GB SSD). This is with the changes in https://github.com/systemed/tilemaker/compare/master...cldellow:tilemaker:perf-tweaks-2, to avoid some lock issues that show up with more than 16 cores.

This is just OSM, no shapefiles.

planet-latest (with --store, not clamped from -60 to 75):

real    65m36.727s
user    2075m8.181s
sys 75m47.390s

north-america (without store, --no-compress-ways, --no-compress-nodes):

real    15m53.374s
user    460m6.072s
sys 36m10.536s

I screwed up the command for NA and wrote to individual files rather than an SQLite db, which I suspect penalized me a bit - lots of uninterruptible I/O states showed up while it was running.

A few things stood out when running the planet, I'll see if I can fix them up.

systemed commented 9 months ago

That's superb.

(I am now getting serious Hetzner box envy.)