systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.42k stars 228 forks source link

generalize node_keys; add way_keys #629

Closed cldellow closed 7 months ago

cldellow commented 7 months ago

This PR generalizes the idea of node_keys, adds way_keys, and fixes https://github.com/systemed/tilemaker/issues/402.

I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in https://github.com/systemed/tilemaker/issues/190 and, elsewhere, in https://github.com/onthegomap/planetiler/issues/99

If you feel it complicates the maintainer story too much, please reject.

The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways.

For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Or, to build a map of national/provincial parks, 12M nodes and ~120MB of RAM.

Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But if you're me, flailing about in the OSM data model, it's convenient to be able to tweak something in the Lua script and observe the results without having to re-filter the PBF and update your tilemaker command to use the new PBF.

Sample use cases:

-- Building a map without building polygons, ~ excludes ways whose
-- only tags are matched by the filter.
way_keys = {"~building"}
-- Building a railway map
way_keys = {"railway"}
-- Building a map of major roads
way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}`

Nodes used in ways which are used in relations (as identified by relation_scan_function) will always be indexed, regardless of node_keys and way_keys settings that might exclude them.

A concrete example, given a Lua script like:

function way_function()
  if Find("railway") ~= "" then
    Layer("lines", false)
  end
end

it takes 13GB of RAM and 100 seconds to process North America.

If you add:

way_keys = {"railway"}

It takes ~2.5GB of RAM and 47 seconds.

Notes:

  1. This is based on lua-interop-3, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged.

  2. The names node_keys and way_keys are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be node_filters and way_filters or similar?

  3. Maybe the value for node_keys in the OMT profile should be expressed in terms of a negation, e.g. node_keys = {"~created_by"}? This would avoid issues like https://github.com/systemed/tilemaker/issues/337 -- probably not very critical.

  4. This also adds a SIGUSR1 handler during OSM processing, which prints the ID of the object currently being processed. This is helpful for tracking down slow geometries.

systemed commented 7 months ago

Absolutely happy with this - it's a good power-user feature and a handy optimisation for those using custom profiles. Thanks!

But if breaking changes are OK, maybe these should be node_filters and way_filters or similar?

Given that we're breaking the Lua interface anyway then yes, let's go for _filters.

cldellow commented 7 months ago

Merged into v3 branch: 9080cdb10e6afe98f3b3e36391dd2ca356d0af4e