Open tbodt opened 1 month ago
I see planetiler has a default list of languages: https://github.com/onthegomap/planetiler/blob/169627dea9b024f4b64f53039c302778c1c273bf/planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java#L106. This unfortunately doesn't include the language I want to use. I can imagine including all the languages would mean significantly bigger tile files...
Luckily it's not related to planetiler and is very simple to achieve.
Download the JSON from the style you prefer, for example this JSON: https://tiles.openfreemap.org/styles/liberty
Do a search and replace for "name_en" with "name_de" or whichever you prefer. Upload the JSON somewhere and simply point your style to this JSON instead of the default ones.
A dynamic JS snippet which does this is the following:
for (const layer of style.layers) {
if (!layer.layout) continue
const textField = layer.layout['text-field']
if (!textField) continue
// highway numbers, etc.
if (isEqual(textField, ['to-string', ['get', 'ref']])) continue
const id = layer.id
let separator
if (id.includes('line') || id.includes('highway')) {
separator = ' '
} else {
separator = '\n'
}
const parts = [
['get', `name_${langCode}`],
['get', `name:${langCode}`],
['get', 'name'],
]
layer.layout['text-field'] = [
'case',
['has', 'name:nonlatin'],
['concat', ['get', 'name:latin'], separator, ['get', 'name:nonlatin']],
['coalesce', ...parts],
]
}
This works decently well, but there are two problems
This is basically the official solution. Actually the official solution is way more basic, I've spent a lot of time polishing it till I got to the version I posted. Have a look at the official example: https://maplibre.org/maplibre-gl-js/docs/examples/language-switch/
About fiddling with JSON, basically all Mapbox/Maplibre styles are just that, JSON. I might set up a nginx function for this, but at the end of the day it'll just be a JSON with strings.
Now, about the mismatch of between OSM and OpenMapTiles, it's outside the scope of this project. You can see what's exactly in the data by going to the inspector mode of Maputnik: https://maputnik.github.io/editor?style=https://tiles.openfreemap.org/styles/bright
Thanks for the link to maputnik, it's definitely easier to use than manually deserializing pbfs.
What I don't understand, though, is the difference between OSM and OpenMapTiles, and why they would get out of sync. AIUI this project, OpenFreeMap, runs Planetiler, and then Planetiler fetches its data from OSM to generate all the tiles. Where does OpenMapTiles come in? What actually is OpenMapTiles?
Wrote a document for debugging international names: https://github.com/hyperknot/openfreemap/blob/main/docs/debugging_names.md
Turns out the list of languages in OpenMapTiles is equivalent to the list of languages in OpenFreeMaps, so that explains where it's coming from. What I don't understand is, why does planetiler use OpenMapTiles and not OSM directly?
Would it be feasible to simply include all languages with data in wikidata in the tileset, or would that cost too much disk space?
So for your questions:
Why does planetiler use OpenMapTiles and not OSM directly?
Because OSM is just a database dump, it's not usable on it's own. You need to make a schema which can later describe geometries in the vector tiles. One such schema is OpenMapTiles. Other is https://shortbread-tiles.org/
Would it be feasible to simply include all languages with data in wikidata in the tileset, or would that cost too much disk space?
I don't know the answer for this, you should ask it in the Planetiler repo.
Makes sense. Going to figure out what would need to change in Planetiler.
I've opened a ticket to render a full planet on the other two OpenMapTiles, so we can compare their implementation. This way we can see if something is planetiler specific, or is present in the other two implementation as well. https://github.com/hyperknot/openfreemap/issues/25
My assessment is:
I'm currently looking into the feasibility of these Planetiler changes, hopefully I'll be able to do a full planet run to test the difference in output size.
Sounds great, thank you for digging into this!
These are the command line options I'm using for planetiler on a 128 GB machine. It takes about 5 hours to run: https://github.com/hyperknot/openfreemap/blob/f8f46a37ef9b8a2c19c1361843051e81bd544594/modules/tile_gen/tile_gen_lib/planetiler.py#L35-L55
Looks like I can do a run on my 8GB machine in about 20 hours. Surprisingly this is fast enough for me for the moment since I have other things to do.
So, the planetiler output without the language filter is about 2GB larger.
-rw-r--r-- 1 tbodt staff 93248712704 Oct 3 11:26 data/planet-all-langs.mbtiles
-rw-r--r-- 1 tbodt staff 91267948544 Oct 1 04:15 data/planet-osm-langs.mbtiles
I think this is worth it, what do you think? I'll work on the pull request for planetiler soon.
That's a great work, thank you! I think definitely open a PR in Planetiler and let them decide.
But before that, I'd make a very clear example of before-after for a few items, to understand what's missing from the old version.
In https://github.com/onthegomap/planetiler/issues/1043 they've said that it would make sense to have a flag, but they wouldn't make it the default since the default should match OMT as closely as possible. The question for you is whether you would set this flag for OFM.
I can post some before/afters here for you if you like.
Yes please. I mean especially include languages which you believe should be included. I'm not sure the right choice is to have hundreds of languages if no one would use them.
Also, could you compare the size totals of 4-6 tiles which are normally loaded for some popular view, say London or New York? I'm afraid that the 2 GB of size growth isn't distributed evenly, where every tile is 2% bigger, but some popular areas being 10-15% bigger, but it's just a guess from my side.
I believe we should include every language indiscriminately - if a language has data, that means someone cared enough about it to type in labels.
Before making a decision, I'd be curious about other map platforms choice is on this.
Maptiler offers these languages:
A list of maps with localization is at https://wiki.openstreetmap.org/wiki/Map_internationalization. Of those my language "tok" is only supported by Wikimedia maps, which seems to include every language indiscriminately, but the terms of use allow use only for Wikimedia projects.
Although there are thousands of unique name:
tags in OpenStreetMap (OSM),
many of them do not match the correct language code definitions.
https://taginfo.openstreetmap.org/search?q=name%3A#keys
Wikimedia recognizes around 710 language codes ( https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all ), so ideally, this number of languages should be included to ensure all known languages are represented. By clicking on the "WDQS" query, you can download the language codes in CSV format, which can be used after some cleaning.
@tbodt :
my language "tok" is only supported by Wikimedia maps, which seems to include every language indiscriminately, but the terms of use allow use only for Wikimedia projects.
name:tok
( https://en.wikipedia.org/wiki/Toki_Pona )
Has only 323 objects in the OpenStreetMap : https://taginfo.openstreetmap.org/keys/name%3Atok#overview
The hard part is - creating a perfect map with 'Sitelen Pona' labels won't be easy due to the ~120 hieroglyphic characters.
Transliteration isn't straightforward, as many non-Latin scripts (Arabic, Chinese, Japanese, Hebrew, etc.) first need to be converted into Latin characters before the Toki Pona transliteration can work.
Happily, transliteration to sitelen pona is not really necessary. Latin script is more commonly used for toki pona anyway, so I would be happy with that on the map. If toki pona with sitelen pona was an option it would be its own language code with its own labels defined, just like how there are multiple language codes for Japanese. Automatically transliterating everything is not really worth doing.
Thank you for the research @ImreSamu. I thought about this and would like to choose the following decision forward:
@tbodt if you submit a PR to planetiler to add individual languages and not all, then I'm happy to include "tok" in OpenFreeMap.
For your proposal of including all languages, why don't you convert your render to PMTiles and host that on Cloudflare? I mean, you can make an EveryLanguageMap or similar, I think it'd be a very interesting project!
Ultimately I'm asking to add every language here instead of creating my own because it's much easier for you to do than for me, regarding the cost of serving and generation: a few % extra for you, an entire new project to maintain for me. I don't really understand why not. Yes, I'm here to make a Toki Pona map, but in the process it started to look like it would be just as easy to fix this for every language instead of just mine, and I like the idea of not leaving anyone out.
That said, the idea of adding individual languages suggests a good idea for designing the planetiler flags, I'll see what I can implement there.
I understand your point, but there are two big reasons why I think it't not a good idea for this project:
Your map could be a perfect candidate for PMTiles + Cloudflare R2 hosting, you can literally host it for free. And about generation, you've just made a full planet run, if you don't want to update it frequently then you are basically done! I really mean you can set up your EveryLanguageMap in like a few hours and it'd be a super nice open source project.
Thanks for the explanation. Indeed the cost is doing map updates - to keep the map up to date essentially requires scheduling and monitoring reruns forever.
I do plan to look at which tiles grow the most, will let you know what I find.
By the way, the OSMUS Tile Service (which powers OSM Americana, among other things) passes a very long list of languages into Planetiler (unfortunately not including tok
, but feel free to open an issue about that). It would be great to not have to pass in language codes explicitly.
What would it take to support rendering the map in any language which has translations in OSM?