systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.4k stars 227 forks source link

Prioritization of citynames #517

Open thebetterbits opened 11 months ago

thebetterbits commented 11 months ago

Hi,

first I thought we made a mistake, because sometimes smaller cities show before bigger cities, but then I saw it also on your website. Before the map shows London it shows "City of Westminster" or before Birmingham it shows "Lichfield" etc. Is this something that can be fixed? :-)

Thank you in advance.

cldellow commented 8 months ago

I also ran into this. I'll share my understanding and my approach to solving this for my maps, but take with a grain of salt as I'm very new to this. Sorry for the wall of text!

Background

There are multiple ways you can solve the problem of choosing which labels to prioritize. It requires cooperation between your processing pipeline that produces the mbtiles file, your stylesheet, and your rendering engine.

Some confusion might come from the fact that Tilemaker itself is agnostic about this problem, but it does ship an OpenMapTiles schema-compatible processing pipeline that many people use when producing their mbtiles file.

OpenMapTiles support for prioritizing cities

The OpenMapTiles schema defines a place layer. Within that layer are contained your countries, states, cities and islands. Each item in the layer can have a rank attribute, defined in the schema as follows:

Countries, states and the most important cities all have a rank to boost their importance on the map. The rank field for countries and states ranges from 1 to 6 while the rank field for cities ranges from 1 to 10 for the most important cities and continues from 10 serially based on the local importance of the city (derived from population and city class). You can use the rank to limit density of labels or improve the text hierarchy. The rank value is a combination of the Natural Earth scalerank, labelrank and datarank values for countries and states and for cities consists out of a shifted Natural Earth scalerank combined with a local rank within a grid for cities that do not have a Natural Earth scalerank.

At this point, it's worth noting that the OpenMapTiles schema is comprehensive and complex. Supporting it 100% is a big task. Accordingly, there are parts that Tilemaker's OpenMapTiles schema-compatible pipeline does not support until someone comes along and creates a PR adding support for it.

In particular, city ranks appears to be one such spot. You can see the relevant code here, and note that it only sets ranks for countries, not for cities.

Even assuming that Tilemaker's OpenMapTiles pipeline supported ranks, you would also need a style file that respected it. The popular open source style files, for example, OSM Bright do not support it. Instead, they prioritize based on the place tag -- villages are below towns, which are below citys, which are below citys that are capitals.

How I prioritize places

I take a different approach. I do not think population alone is enough to prioritize city labels. For example, near the Great Smoky Mountains there are two places:

In my opinion (or at least, for my purposes), Gatlinburg is the more important place to label on a map. But the population alone would not tell you that. In fact, it seems like OSM taggers agree, and have tagged it as a city (the largest settlement or settlements within a territory), and Maryville as a town (an important urban centre that is larger than a village, smaller than a city, and not a suburb). But that tagging seems very surprising to me - I suspect it is down more to make the map "feel" right than in strict accordance with the tagging guidelines.

Instead, to be robust against the whims of taggers or the artificial precision of population, I'm using a different approach: Wikipedia pageviews. This is also flawed, of course. :)

OSM objects can be linked to Wikipedia articles. Wikipedia publishes pageview data, for example, here is the pageview data for Gatlinburg and Maryville. You can see that Gatlinburg is about twice as popular as Maryville.

My processing pipeline embeds the pageview data into my mbtiles file.

Then my style file tells my renderer to prioritize places with higher pageviews:

      "layout": {
        "text-field": "{name}",
        "text-font": ["Noto Sans Regular"],
        "text-size": {"stops": [[7, 11], [10, 14]]},
        "symbol-sort-key": ["get", "views"]
      }

Conclusion

Why does City of Westminster show up before London? I think the answer is bad luck. They are both tagged as city, and so considered to be on level playing ground. The order they get emitted to the mbtiles file is likely what controls which one loses when there is not enough room to draw both labels.

Hopefully that's enough context that you can figure out how to move forward for your own projects.

To me, it's not super clear what a general fix would be. I can think of two paths:

Path 1: Someone submits a PR to teach Tilemaker's OpenMapTiles pipeline about rank for cities. This would need people to agree on how the rank should be computed (is the Natural Earth data easy to integrate? would it be better to drive it from something entirely self-contained in the OSM dataset?), and you'd need people to use a stylesheet that prioritized based on rank.

Path 2: Someone could submit a PR that uses Tilemaker's ZOrder function to control the order in which places are written to the mbtiles file. This would likely improve the tie-breaking for places of the same kind (like this issue's examples of cities). You'd have to decide what to drive the ZOrder from -- if it was population, you'd also need to go and improve OSM's data to have population fields for all the cities you care about.

systemed commented 6 months ago

It might be interesting to integrate Wikidata QRank, not just for cities/towns but for other features such as mountain peaks: https://github.com/brawer/wikidata-qrank

cldellow commented 6 months ago

Ha, yes. QRank renders my wall of text unnecessary - I wish I had learned of it sooner. :)

It works pretty well. There are some oddities: Index, Washington and Hanna, Alberta are very minor towns. Yet QRank would have you believe that they are as prominent as say, former-Winter-Olympics-host-city Vancouver, BC. I suspect this is a conflation error when there are disambiguation pages on Wikipedia, as there are for Index and Hannah (which Hanna redirects to).

If someone wants to integrate QRank into tilemaker's OpenMapTiles profile, my code is available:

dieterdreist commented 6 months ago

Great project, I also found some oddities, e.g. in the Tübingen / Reutlingen comparison, two medium sized towns with 92.000 vs 117.000 population about 10km apart, (where Tübingen rightfully „wins“), there is one odd spike for Reutlingen which goes higher than Tübingen while the rest of the years it is always much lower. Many maps (especially those automated webmaps as opposed to handselected older maps) have Reutlingen much more prominent (as it is beyond the 100.000 pop threshold) https://www.openstreetmap.org/#map=11/48.4936/9.1224 although Tübingen has a university from the 15th century and is an admin level 5 capital opposed to Reutlingen (level 6)