3.0 release planning - Githubissues

systemed commented 8 months ago

We're not far off being able to put together a 3.0 release. 🎉

Changes merged into master:

[x] #618
[x] #620
[x] #603
[x] #623
[x] #625
[x] #630
[x] #631

Breaking changes to merge into v3 branch:

[x] #626
[x] #629
[x] #636
[x] #638
[x] remove mapsplit support (no longer needed given reduced RAM usage, and I don't think anyone uses it anyway)
[x] de-emphasise Ruby server (but keep it around as an example of how to a tileserver in a scripting language)
[x] update changelog
[x] update readme
[x] review docs
[x] review command-line switches and console output (see #618 comments)
[x] close any related issues

@cldellow - does this sound good to you or is there more you'd like to include?

cldellow commented 8 months ago

Exciting!

Depending on your timeline, there are two other things that seem promising

use protozero for PBF reading - the motivation for this is touched on in #621
write a custom map class for AttributePair and AttributeSet

The motivation for AttributePair/AttributeSet - I think we can save ~600MB of RAM in a planet build. Between that and (1), I think we might get away with fewer, bigger splits when doing the planet on a constrained-memory build, so it's worth doing.

The idea is: we currently refer to AttributePairs and AttributeSets compactly, e.g. by a uint32_t that references their index in a deque<AttributePair>, rather than by a full pointer.

When reading a new attribute, we need to know if we've seen it already, and if so, what its index is. To do this, we have a boost::container::flat_map<AttributePair*, uint32_t>.

It has a custom less-than comparator so that the pointer is compared by its logical contents, not its memory address. This lets us ask "is this new AttributePair, who may be stored in a different memory location than a previously-seen version of it, already logically present in the map?"

But there is some waste: we effectively store the pointer in the map twice. The first copy is the key, which is literally a pointer. The second copy is the index value -- the uint32_t identifies the offset in the deque, which, if you squint, is a pointer to an AttributePair. Deques don't invalidate memory locations when they grow, so the pointer obtained by indexing to that location will be valid throughout the life of the program.

Thus, where today the flat map stores a vector<pair<AttributePair*, uint32_t>>, I think if we write a small wrapper, we ought to be able to get away with a custom container that stores a vector<uint32_t> and supports the same operations. I believe there are ~40M AttributeSets and ~30M AttributePairs, so I expect a savings of 70M * 8 => 560M. Our task is simpler than a general-purpose map, since we're only interested in insert + find -- iteration, deletion, etc aren't needed.

systemed commented 7 months ago

Both sound good. I'm not in an inordinate hurry to do the release so happy to wait for those!

systemed commented 7 months ago

Incidentally, there might be some possibilities for memory saving in the .pmtiles branch I've just merged in - it creates a big map/vector (depending on size) of the location of each tile within the file, for writing out into the pmtiles directories at the end of the process. I wondered whether this could potentially be mmaped after #618 - writing by z6 tile means that we'll be filling up nearby entries most of the time.

cldellow commented 7 months ago

Incidentally, there might be some possibilities for memory saving in the .pmtiles branch I've just merged in - it creates a big map/vector (depending on size) of the location of each tile within the file, for writing out into the pmtiles directories at the end of the process. I wondered whether this could potentially be mmaped after https://github.com/systemed/tilemaker/pull/618 - writing by z6 tile means that we'll be filling up nearby entries most of the time.

Oh, good thought, denseIndex does look like a candidate for being mmap-able. I'm guessing the savings would be ~640MB? When I build the planet without shapefiles, I get ~80M tiles, so I'm treating that as the upper bound on the number of tiles with "interesting" things in them, and TileOffset is an 8-byte struct.

It might also be possible to approach it from a different angle -- maybe they could be flushed from memory to the pmtiles archive earlier? If I understand the pmtiles spec, I think things can be scattered throughout the archive willy-nilly when non-clustered mode is used. It might complicate the bookkeeping, and be a little tricky to do without imposing a lot of locking overhead.

systemed commented 7 months ago

When I build the planet without shapefiles, I get ~80M tiles, so I'm treating that as the upper bound on the number of tiles with "interesting" things in them, and TileOffset is an 8-byte struct.

I think it'll be more than that - there are many thousands of sea tiles when building with shapefiles, and we need an index for each of them.

It might also be possible to approach it from a different angle -- maybe they could be flushed from memory to the pmtiles archive earlier? If I understand the pmtiles spec, I think things can be scattered throughout the archive willy-nilly when non-clustered mode is used. It might complicate the bookkeeping, and be a little tricky to do without imposing a lot of locking overhead.

Each pmtiles leaf directory is a series of file offsets for contiguously numbered tiles (using pmtiles' Hilbert tile numbering). The leaf directories are all together in the .pmtiles archive.

What this means in practice:

Either we write all the leaf directories together at the end of file (as we do now), or we reserve space for them at the start of the file, and write them as we go along. But if we do the latter we have to estimate how big the leaf directories are going to be before we start writing tiles.
If we have space reserved at the start of the file, we can flush a leaf directory to disk once it's complete. Directory completion is obviously not linear though - some tiles take longer than others, and we don't build the tiles in pmtiles Hilbert order. So there'd still be several (maybe many?) incomplete directories in memory before we flush.

With that in mind I suspect mmaping denseIndex is possibly easiest - but I haven't tried it!

cldellow commented 7 months ago

I think it'll be more than that - there are many thousands of sea tiles when building with shapefiles, and we need an index for each of them.

Ah, right. I misunderstood how isSparse worked.

And re-reading the PMTiles spec, I see that I was hallucinating, and it's not valid to intersperse tile data with leaf directories. Let's pretend I didn't comment. :)

systemed commented 7 months ago

I've created a v3 branch: so far it has master + #626, #629, #636 and a bit of tidying. I think I've merged #629 correctly but you might want to check.

cldellow commented 7 months ago

I think I've merged https://github.com/systemed/tilemaker/pull/629 correctly but you might want to check.

I did a smoke test, seems good to me.

systemed commented 7 months ago

A bit of benchmarking for v3 on my current machine:

Great Britain (no --store)

default:

Elapsed (wall clock) time (h:mm:ss or m:ss): 4:18.12
Maximum resident set size (kbytes): 12210088

--fast:

Elapsed (wall clock) time (h:mm:ss or m:ss): 4:17.14
Maximum resident set size (kbytes): 12119416

--lazy-geometries:

Elapsed (wall clock) time (h:mm:ss or m:ss): 4:25.12
Maximum resident set size (kbytes): 9272368

So when running without --store, I'm inclined to default to --lazy-geometries (for the significant memory saving), but turn it off with --fast.

planet (with --store on SSD)

default:

peak memory ~18.3GB (note this is quite an old planet)
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:24:27
Maximum resident set size (kbytes): 131827336

--fast:

peak memory similar
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:09:23
Maximum resident set size (kbytes): 139364168

I haven't yet tried with --materialize-geometries on the planet.

leshak commented 7 months ago

Today: build tilemaker from v3 branch on ubuntu 22.04

Last planet (74327MB)

default options + --fast
scripts from v3 + -- preferred_language = "ru" -- preferred_language_attribute = "name:ru"
VM: 48 cores (2GHz) + 768GB RAM + SSD
peak memory ~345GB + 158GB (cache/buffers) by top

time:

real    84m9.543s
user    3596m55.728s
sys     109m37.462s

systemed commented 7 months ago

Release done!

real 84m9.543s

That's amazing - thank you for running that as a benchmark.

cldellow commented 7 months ago

:+1: No more excuses for me to avoid working on my hobby map project now, I guess. :)

Thanks, @systemed, for your patience in answering my questions and reviewing PRs over the past few months. I really appreciate it.

The in memoriam you added for Wouter van Kleunen is very thoughtful. I stumbled across many of his contributions and discussions here and in the boost geometry repos, learning something from him each time.

systemed commented 7 months ago

Thanks, @systemed, for your patience in answering my questions and reviewing PRs over the past few months. I really appreciate it.

Not at all - you've made massive improvements, so thank you!

The in memoriam you added for Wouter van Kleunen is very thoughtful. I stumbled across many of his contributions and discussions here and in the boost geometry repos, learning something from him each time.

A lot of his code was really inspired - particularly the really intense geometry stuff such as intersection-aware simplification and the dissolve/correct algorithm. I hope he's in a better place.

StephenAtty commented 6 months ago

Just wanted to say thanks for all the hard work. Generated the North America tile set in 50 minutes on my 32GB machine and it never used more than 1GB of swap when V2 was using several 10s of GB. Just throwing Europe through it now

systemed / tilemaker

3.0 release planning #622

Great Britain (no --store)

planet (with --store on SSD)