systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.42k stars 228 forks source link

be able to render the planet with 32gb of RAM #618

Closed cldellow closed 7 months ago

cldellow commented 8 months ago

This PR lets Tilemaker build the planet on smaller machines.

On a Vultr 16-core, 32GB, 500GB SSD machine:

$ time tilemaker --store /tmp/store --input planet-latest.osm.pbf --output tiles.mbtiles --shard-stores
real    195m7.819s
user    2473m52.322s
sys 73m13.116s

Runtime for non-memory constrained boxes isn't affected, e.g. on a Hetzner 48-core, 192 GB machine:

$ time tilemaker --store /tmp/store --input planet-latest.osm.pbf --output tiles.mbtiles
real    65m20.082s
user    2570m33.530s
sys 41m15.420s

On a $ basis, if you're renting a machine to do the work, it's cheaper to use a bigger box. But for folks who need to use what they already have, this may be a useful PR.

The changes are a mix of using less memory, spilling more things to disk, and thrashing less when things are backed by disk.

Using less memory:

Spill more things to disk:

Thrash less:

Potential future improvements:

These are mostly smaller issues that can be happily ignored forever, just wanted to write them down so I can forget about them.

systemed commented 8 months ago

That's really impressive - thank you!

I haven't had the chance to go through all the source yet but the results look very impressive - I ran my usual Europe extract through it (with shapefiles), and memory consumption was 8GB when reading the .pbf, going up to 9GB when generating tiles. Total time 2hr13. I'll have a go at the planet tomorrow.

systemed commented 8 months ago

Using the old (mid-2021) planet I've run previous tests with, and including shapefiles, memory consumption was 18.2GB - which is amazing. Total time 5hr39. (Before this PR it was 5hr12 and 40.2GB.)

Comparing with Europe, that suggests a very rough estimated RAM requirement of one-third the .osm.pbf size.

systemed commented 8 months ago

Played with this a bit more today and still impressed. Also thanks for the copious comments which help me to understand what's going on!

I think the only suggestion I'd make is that we now have a fairly broad array of performance options (--no-compress-nodes, --no-compress-ways, --materialize-geometries, --shard-stores, plus of course --store and --compact have performance implications). I suspect most users won't understand which to pick.

I guess there are three common scenarios:

  1. Small extract (do everything in memory)
  2. Planet or large extract on expansive hardware (use store and optimise for run-time)
  3. Constrained hardware (use store and optimise for RAM consumption)

These could perhaps be represented by the following run-time options:

  1. (no flags specified)
  2. --store /path/to/ssd --fast (equivalent of --materialize-geometries on, --shard-stores off)
  3. --store /path/to/ssd (equivalent of --materialize-geometries off, --shard-stores on)

We can then simply tell people "if you have lots of memory and are working with a big extract, use the --fast option".

We can still retain the granular controls, but maybe put them in a separate "performance tuning" option group.

cldellow commented 7 months ago

Yes, good call on the flags and de-emphasizing the individual knobs. I'll make that change.

cldellow commented 7 months ago

Hopefully you ignored the noise of my commits during Christmas! :) Please don't feel any urgency to do anything with this or the other PRs I'll open this week -- this is just my version of tinkering with trains in the basement over the holidays.

Since my last comment:

I did some benchmarking [1] and observed that the logic should maybe be:

The --help after this commit:

tilemaker v2.4.0
Convert OpenStreetMap .pbf files into vector tiles

Available options:
  --help                       show help message
  --input arg                  source .osm.pbf file
  --output arg                 target directory or .mbtiles/.pmtiles file
  --bbox arg                   bounding box to use if input file does not have 
                               a bbox header set, example: 
                               minlon,minlat,maxlon,maxlat
  --merge                      merge with existing .mbtiles (overwrites 
                               otherwise)
  --config arg (=config.json)  config JSON file
  --process arg (=process.lua) tag-processing Lua file
  --verbose                    verbose error output
  --skip-integrity             don't enforce way/node integrity
  --log-tile-timings           log how long each tile takes

Performance options:
  --store arg                  temporary storage for node/ways/relations data
  --fast                       prefer speed at the expense of memory
  --compact                    use faster data structure for node lookups
                               NOTE: This requires the input to be renumbered 
                               (osmium renumber)
  --no-compress-nodes          store nodes uncompressed
  --no-compress-ways           store ways uncompressed
  --lazy-geometries            generate geometries from the OSM stores; uses 
                               less memory
  --materialize-geometries     materialize geometries; uses more memory
  --shard-stores               use an alternate reading/writing strategy for 
                               low-memory machines
  --threads arg (=0)           number of threads (automatically detected if 0)

[1]: Details in https://github.com/systemed/tilemaker/pull/618/commits/657da1ab92fcf65de3f5adafcceddc064ef5e73d - it wasn't quite this branch, it was this branch + protobuf + lua-interop

systemed commented 7 months ago

All working really well! Ready to merge, do you think?

Running this PR with Great Britain on my usual box:

/usr/bin/time -v tilemaker --input /media/data1/planet/great-britain-latest.osm.pbf --output ~/tm_debug/gb5.mbtiles
    Elapsed (wall clock) time (h:mm:ss or m:ss): 4:59.99
    Maximum resident set size (kbytes): 12275684

/usr/bin/time -v tilemaker --input /media/data1/planet/great-britain-latest.osm.pbf --output ~/tm_debug/gb4.mbtiles --lazy-geometries
    Elapsed (wall clock) time (h:mm:ss or m:ss): 5:16.00
    Maximum resident set size (kbytes): 9155756

It's a big memory saving (25%) for a small time penalty (5%) - so maybe we should default to --lazy-geometries, both for in-memory and --store. But I realise one could probably bikeshed this all day. :)

cldellow commented 7 months ago

Yup, merge away.

I have no strong views on the defaults--let me know if you'd like them changed

systemed commented 7 months ago

Merged. Thank you again - this is going to make a massive difference to users.

I'll do some experimenting with the defaults before we release 3.0 but it's not crazily urgent.