systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.44k stars 229 forks source link

AttributeStore memory tweaks #583

Closed cldellow closed 9 months ago

cldellow commented 9 months ago

This fixes a bug where:

obj:Attribute('name', 'some value')
obj:Attribute('name', 'some other value')

sometimes resulted in the first value being saved, sometimes the second. Now the last value always wins.

There are also some memory reductions: about 2,300MB less memory is needed for GB. This comes from two locations:

The PR does a few things.

  1. Rather than store a whole std::string in AttributePair to identify the key (name:latin, kind, iata, etc), it identifies the key by a uint16_t. This limits Tilemaker to only 64K keys -- probably fine? Shortbread uses about 50, for reference.
  2. Rather than store a whole AttributePair in an AttributeSet, it references them by a vector of uint32_t. This limits Tilemaker to 4B AttributePairs. This might be an issue -- but it's easily extended to more if needed. I see elsewhere that Tilemaker is limited to 1B AttributeSets -- probably if one has an issue, the other will, too.
  3. I lied, AttributeSet only sometimes uses an actual vector of uint32_ts -- most of the time, a vector is overkill, as items only have a handful of attributes. Instead, AttributeSet typically uses the 24 bytes that a vector would occupy to store an array of 4 shorts and 4 ints†. Only if this array gets exhausted does it change to using a vector.
  4. vector_tile::Tile_Value turns out to be quite big -- it's 96 bytes. I replaced it with a union of std::string, float and bool. I didn't notice any negative impact to PBF generation at the end from newing it up on a per-layer basis.
  5. Recognize when a user calls Layer with the same geometry, and avoid processing/storing the geometry a second time. For example, when a profile writes a geometry for a river as well as for its name.

† The system tries to assign IDs that would fit in a short to those AttributePairs that it thinks will be very popular, like rank=1, amenity=toilets, indoor=false and so on. This lets most AttributeSets store their pairs without having to allocate a vector.

systemed commented 9 months ago

This looks really good - thank you. I'll have a proper look at it this weekend. I like the optimisation for popular pairs - very clever.

systemed commented 9 months ago

Great Britain:

Germany:

That's an amazing improvement. Thank you!