systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.42k stars 228 forks source link

extend NextRelation/FindInRelation to nodes #632

Closed cldellow closed 7 months ago

cldellow commented 7 months ago

This also makes NextRelation return a tuple of ID, role rather than just ID.

According to https://www.lua.org/pil/5.1.html, I think that's not a breaking change.

Motivation: When I naively try to create labels from OSM entities with the place tag, I occasionally get duplicates due to relations.

For example, the city of Guelph has relation 7486148, which has node 36576733 as its label member.

They both have place=city, so my Lua script faithfully spits out two labels.

This PR allows me to suppress the label from the node, which is a start.

Ideally, I'd actually prefer to use the node, as it'll likely have a nicely human-curated location. For that, I'd need the relation to be able to interrogate its members.

Would you be open to me extending this PR to add NextMember, FindInMember and RestartMembers functions that mirror the ones used by nodes/ways ?

systemed commented 7 months ago

Thanks! Reading roles, and extending support to nodes, is a big win. Providing NextMember etc. sounds good too.

Relations are generally a mess, and each type is basically sui generis. As a general rule I try to keep tag-specific logic out of the C++ code, and expect users to handle that in the Lua.

The only (reluctant) exceptions are multipolygon and boundary relations, and their constituent inner/outer roles, which are effectively geometries expressed in metadata (yuk) and therefore we have to give them special handling. (In an ideal world we'd have an OSM <Area> datatype rather than this kludge but there you go.) tilemaker transparently handles multipolygon relations as ways, and boundary relations as relations but with the multipolygon geometry correctly generated.

So I'm a little anxious about hard-coding admin_centre and label - they are of course already a part of the hard-coded boundary relation but I'd like to keep that to the minimum possible. If NextMember etc. would allow us to do that via Lua, that would be good. One other option might be to add support to LayerAsCentroid, so you could write something like this

LayerAsCentroid("places", "label", "admin_centre")

which would write a node to the places layer, with the geometry taken from (a) a label member, (b) failing that, an admin_centre member, (c) failing that, an automatically generated centroid.

cldellow commented 7 months ago

Great feedback. I'll rework this:

Something that gives me pause: I am separately considering implementing support for polylabel. I think this would address issues #392 and #447 (and perhaps lessen the urgency of #467).

I hadn't thought about how that would surface to the user, though. Did you have anything in mind?

One option in line with what you'd proposed here would be:

LayerAsCentroid("places", "label", "admin_centre", "polylabel")

But I wonder if the user would almost always want polylabel as the fallback? So maybe LayerAsCentroid should just default to that. But then it's a bit bizarre that we have centroid in the name, so perhaps LayerAsCentroid should be renamed to LayerAsPoint. But then that's a breaking change, and it'd be worth trying to make it happen before pushing out v3.

systemed commented 7 months ago

Polylabel support would be excellent. Maybe something like

LayerAsPoint("places", strategy, "admin_centre", "label" [...])

where the 2nd parameter onwards are optional, and strategy is an enum for the calculation algorithm: 0 for polylabel, 1 for centroid, 2 for Boost.Geometry's point_on_surface, and so on. Lua doesn't actually have enums of course, but we could potentially define constants somewhere.

cldellow commented 7 months ago

That ended up being pretty straightforward. No more duplicate labels, and I can now use the label node when present -- awesome.

cldellow commented 7 months ago

D'oh, I guess this could be considered a breaking change after all: https://github.com/systemed/tilemaker/pull/632/commits/154d3ab73e7a83f564e6cc60ea157fe0e29ce050

systemed commented 7 months ago

Looks good! I'm not too worried about breaking undocumented behaviour even if we do erroneously use that behaviour ourselves (oops) - and I think we're on the verge of 3.0 anyway.

Passing thought - is there any benefit in using some sort of table, or PooledString maybe, for relation roles rather than std::string? A quick check with taginfo suggests there's only 129 roles in widespread use (https://gist.github.com/systemed/29ea4c8d797a20dcdffee8ba907d62ea). (taginfo doesn't show roles for relations used <100 times.)

cldellow commented 7 months ago

Passing thought - is there any benefit in using some sort of table, or PooledString maybe, for relation roles rather than std::string?

I guess there are two concerns:

  1. Memory used by the std::string in relationsForWays and relationsForNodes
  2. Runtime impact of Lua interop

Re 1 - these don't spill to disk and are each taking 24 bytes. PooledString would get us down to 16 bytes, so save 8 bytes. Or we could do some custom mapping that could fit in a short for 2 bytes. So if we did the most aggressive thing, we could save 22 bytes per relation member.

The UK has 3.9M relation members in a 1.6GB PBF. If I scale that to a 72GB planet file, that suggests 175M relation members, so the savings could be as much as 3.8GB.

Yeah, that's probably worth doing. I'll have a think on how to do it without introducing a lot of locking.

Re 2 - for the most popular roles, I think if we saved a reference to a string in the Lua stack, we could avoid the expense of Lua's defensive copies. Relation roles are a good candidate for this, as they're a small set used many times. I expect this to be relatively small compared to other Lua interop costs we're paying, so wouldn't do anything about it immediately.

systemed commented 7 months ago

Yes - it was memory I was mostly thinking of, in particular for route relations. Way/node ID, plus an index into a vector of strings, should fit in a 64-bit struct.

cldellow commented 7 months ago

Updated to reduce memory use for tracking roles, and to add polylabel support.

The parks and forests in the Appalachians of America are a little better now. Note the locations of Cherokee National Forest and Pisgah National Forest. (This is a truncated extract - on the full map, Pisgah would likely be in a different location.)

Algorithm = Centroid: Selection_591

Algorithm = Polylabel: Selection_592

This switches polylabel to be the default.

There are some areas for improvement:

These areas can all be improved -- I'm a little hesitant to do it while there are other Lua PRs outstanding that also have perf implications and will have code conflicts with this set of changes.

Maybe we should start merging the Lua things into a shared branch, or start merging them to master?

cldellow commented 7 months ago

Reading through https://github.com/mapbox/polylabel/issues, I think there's also some risk of infinite loops if we don't pick the precision parameter as a function of the input geometry (and/or if we don't use centroid as a fallback for certain degenerate geometries).

It seemed to work for the areas I tried. I think before cutting a new release, I'd want to run tilemaker with all the Lua changes on the planet in order to confirm there are no regressions in runtime or memory usage.

systemed commented 7 months ago

Excellent - that’s a great improvement in the screenshot. I’m AFK today/tomorrow but will run it on the planet at the weekend unless you beat me to it!

systemed commented 7 months ago

I've just run this (only) over the planet and it works really well. No hangs and I couldn't see any impact on runtime speed. (Planet completed in 4h38m53 with no runtime options other than --store; that's about an hour faster than before #623.)

Label placement is pretty good. Here's Rutland Water, which is U-shaped. Before, it's over the peninsula:

Screenshot 2024-01-06 at 17 18 31

After, it's in the water as it should be:

Screenshot 2024-01-06 at 17 18 43

We don't get all the country labels we'd like at lower zoom levels (particularly the USA) but this is because the points are near tile boundaries - so more of a style/rendering issue than a tile generation issue. At some point in the 3.X series I'd like to ship a new, better style (possibly to go with Shortbread rather than OMT) but that's for the future!

cldellow commented 7 months ago

Great, thanks for running it on the planet!

We don't get all the country labels we'd like at lower zoom levels (particularly the USA) but this is because the points are near tile boundaries - so more of a style/rendering issue than a tile generation issue.

I see this, too, using OSM Bright:

Selection_597

For OSM Bright, at least, I think the issue is unrelated to polylabel.

It looks like the OMT Lua script only writes to the place layer in the node_function: https://github.com/systemed/tilemaker/blob/740505030086e77c62ee9d0e47e61abcf56df80f/resources/process-openmaptiles.lua#L170-L175

OSM Bright only draws a rank 1 country label if there is an iso_a2 attribute: https://github.com/openmaptiles/osm-bright-gl-style/blob/8af4769692d0f9219d0936711609d580b34bf365/style.json#L2391-L2402

iso_a2 comes from ISO3166-1:alpha2 in OSM.

For Canada, both its relation and its label node have such an attribute.

For the US, only its relation has that attribute, not its label node.

Hey, that's exactly the sort of thing that this PR can help with. I've pushed a commit that uses the new support for FindInRelation in nodes to propagate it from the relation if needed, and now USA shows up in OSM Bright.

systemed commented 7 months ago

Well hunted! Think this is ready to go?

cldellow commented 7 months ago

Yup!

systemed commented 7 months ago

Superb - thank you!