hyperknot / openfreemap

Free and open-source map hosting solution with custom styles for websites and apps, using OpenStreetMap data
https://openfreemap.org/
Other
2.87k stars 58 forks source link

Map translation #22

Open tbodt opened 1 month ago

tbodt commented 1 month ago

What would it take to support rendering the map in any language which has translations in OSM?

tbodt commented 1 month ago

I see planetiler has a default list of languages: https://github.com/onthegomap/planetiler/blob/169627dea9b024f4b64f53039c302778c1c273bf/planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java#L106. This unfortunately doesn't include the language I want to use. I can imagine including all the languages would mean significantly bigger tile files...

hyperknot commented 1 month ago

Luckily it's not related to planetiler and is very simple to achieve.

  1. Download the JSON from the style you prefer, for example this JSON: https://tiles.openfreemap.org/styles/liberty

  2. Do a search and replace for "name_en" with "name_de" or whichever you prefer. Upload the JSON somewhere and simply point your style to this JSON instead of the default ones.

A dynamic JS snippet which does this is the following:

for (const layer of style.layers) {
  if (!layer.layout) continue

  const textField = layer.layout['text-field']
  if (!textField) continue

  // highway numbers, etc.
  if (isEqual(textField, ['to-string', ['get', 'ref']])) continue

  const id = layer.id

  let separator
  if (id.includes('line') || id.includes('highway')) {
    separator = ' '
  } else {
    separator = '\n'
  }

  const parts = [
      ['get', `name_${langCode}`],
      ['get', `name:${langCode}`],
      ['get', 'name'],
  ]

  layer.layout['text-field'] = [
    'case',
    ['has', 'name:nonlatin'],
    ['concat', ['get', 'name:latin'], separator, ['get', 'name:nonlatin']],
    ['coalesce', ...parts],
  ]
}
tbodt commented 1 month ago

This works decently well, but there are two problems

hyperknot commented 1 month ago

This is basically the official solution. Actually the official solution is way more basic, I've spent a lot of time polishing it till I got to the version I posted. Have a look at the official example: https://maplibre.org/maplibre-gl-js/docs/examples/language-switch/

About fiddling with JSON, basically all Mapbox/Maplibre styles are just that, JSON. I might set up a nginx function for this, but at the end of the day it'll just be a JSON with strings.

Now, about the mismatch of between OSM and OpenMapTiles, it's outside the scope of this project. You can see what's exactly in the data by going to the inspector mode of Maputnik: https://maputnik.github.io/editor?style=https://tiles.openfreemap.org/styles/bright

image

tbodt commented 1 month ago

Thanks for the link to maputnik, it's definitely easier to use than manually deserializing pbfs.

What I don't understand, though, is the difference between OSM and OpenMapTiles, and why they would get out of sync. AIUI this project, OpenFreeMap, runs Planetiler, and then Planetiler fetches its data from OSM to generate all the tiles. Where does OpenMapTiles come in? What actually is OpenMapTiles?

hyperknot commented 1 month ago

Wrote a document for debugging international names: https://github.com/hyperknot/openfreemap/blob/main/docs/debugging_names.md

tbodt commented 1 month ago

Turns out the list of languages in OpenMapTiles is equivalent to the list of languages in OpenFreeMaps, so that explains where it's coming from. What I don't understand is, why does planetiler use OpenMapTiles and not OSM directly?

tbodt commented 1 month ago

Oh there it is. https://github.com/openmaptiles/planetiler-openmaptiles/blob/5a07ce4ca7593622207cdf8f8330d52d31de8150/src/main/java/org/openmaptiles/generated/OpenMapTilesSchema.java#L62

tbodt commented 1 month ago

Would it be feasible to simply include all languages with data in wikidata in the tileset, or would that cost too much disk space?

hyperknot commented 1 month ago

So for your questions:

Why does planetiler use OpenMapTiles and not OSM directly?

Because OSM is just a database dump, it's not usable on it's own. You need to make a schema which can later describe geometries in the vector tiles. One such schema is OpenMapTiles. Other is https://shortbread-tiles.org/

Would it be feasible to simply include all languages with data in wikidata in the tileset, or would that cost too much disk space?

I don't know the answer for this, you should ask it in the Planetiler repo.

tbodt commented 1 month ago

Makes sense. Going to figure out what would need to change in Planetiler.

hyperknot commented 1 month ago

I've opened a ticket to render a full planet on the other two OpenMapTiles, so we can compare their implementation. This way we can see if something is planetiler specific, or is present in the other two implementation as well. https://github.com/hyperknot/openfreemap/issues/25

tbodt commented 1 month ago

My assessment is:

I'm currently looking into the feasibility of these Planetiler changes, hopefully I'll be able to do a full planet run to test the difference in output size.

hyperknot commented 1 month ago

Sounds great, thank you for digging into this!

These are the command line options I'm using for planetiler on a 128 GB machine. It takes about 5 hours to run: https://github.com/hyperknot/openfreemap/blob/f8f46a37ef9b8a2c19c1361843051e81bd544594/modules/tile_gen/tile_gen_lib/planetiler.py#L35-L55

tbodt commented 1 month ago

Looks like I can do a run on my 8GB machine in about 20 hours. Surprisingly this is fast enough for me for the moment since I have other things to do.

tbodt commented 1 month ago

So, the planetiler output without the language filter is about 2GB larger.

-rw-r--r--  1 tbodt  staff  93248712704 Oct  3 11:26 data/planet-all-langs.mbtiles
-rw-r--r--  1 tbodt  staff  91267948544 Oct  1 04:15 data/planet-osm-langs.mbtiles

I think this is worth it, what do you think? I'll work on the pull request for planetiler soon.

hyperknot commented 1 month ago

That's a great work, thank you! I think definitely open a PR in Planetiler and let them decide.

But before that, I'd make a very clear example of before-after for a few items, to understand what's missing from the old version.

tbodt commented 1 month ago

In https://github.com/onthegomap/planetiler/issues/1043 they've said that it would make sense to have a flag, but they wouldn't make it the default since the default should match OMT as closely as possible. The question for you is whether you would set this flag for OFM.

I can post some before/afters here for you if you like.

hyperknot commented 1 month ago

Yes please. I mean especially include languages which you believe should be included. I'm not sure the right choice is to have hundreds of languages if no one would use them.

Also, could you compare the size totals of 4-6 tiles which are normally loaded for some popular view, say London or New York? I'm afraid that the 2 GB of size growth isn't distributed evenly, where every tile is 2% bigger, but some popular areas being 10-15% bigger, but it's just a guess from my side.

tbodt commented 1 month ago

I believe we should include every language indiscriminately - if a language has data, that means someone cared enough about it to type in labels.

hyperknot commented 1 month ago

Before making a decision, I'd be curious about other map platforms choice is on this.

hyperknot commented 1 month ago

Maptiler offers these languages:

  1. English
  2. Local
  3. Albanian
  4. Amharic
  5. Arabic
  6. Armenian
  7. Azerbaijani
  8. Basque
  9. Belarusian
  10. Bengali
  11. Bosnian
  12. Breton
  13. Bulgarian
  14. Catalan
  15. Chinese
  16. Corsican
  17. Croatian
  18. Czech
  19. Danish
  20. Dutch
  21. English (listed twice)
  22. Esperanto
  23. Estonian
  24. Finnish
  25. French
  26. Georgian
  27. German
  28. Greek
  29. Hebrew
  30. Hindi
  31. Hungarian
  32. Icelandic
  33. Indonesian
  34. Irish
  35. Italian
  36. Japanese
  37. Japanese (Kana)
  38. Japanese (Latin 2018)
  39. Japanese (Latin)
  40. Japanese Hiragana form
  41. Kannada
  42. Kazakh
  43. Korean
  44. Korean (Latin)
  45. Kurdish
  46. Latin
  47. Latvian
  48. Lithuanian
  49. Luxembourgish
  50. Macedonian
  51. Malayalam
  52. Maltese
  53. Norwegian
  54. Occitan
  55. Polish
  56. Portuguese
  57. Romania
  58. Romansh
  59. Russian
  60. Scottish Gaelic
  61. Serbian (Cyrillic)
  62. Serbian (Latin)
  63. Slovak
  64. Slovene
  65. Spanish
  66. Swedish
  67. Tamil
  68. Telugu
  69. Thai
  70. Turkish
  71. Ukrainian
  72. Vietnamese
  73. Welsh
  74. Western Frisian
tbodt commented 1 month ago

A list of maps with localization is at https://wiki.openstreetmap.org/wiki/Map_internationalization. Of those my language "tok" is only supported by Wikimedia maps, which seems to include every language indiscriminately, but the terms of use allow use only for Wikimedia projects.

ImreSamu commented 1 month ago

Although there are thousands of unique name: tags in OpenStreetMap (OSM), many of them do not match the correct language code definitions. https://taginfo.openstreetmap.org/search?q=name%3A#keys

Wikimedia recognizes around 710 language codes ( https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all ), so ideally, this number of languages should be included to ensure all known languages are represented. By clicking on the "WDQS" query, you can download the language codes in CSV format, which can be used after some cleaning.

name:tok - "Toki Pona"

@tbodt :

my language "tok" is only supported by Wikimedia maps, which seems to include every language indiscriminately, but the terms of use allow use only for Wikimedia projects.

name:tok ( https://en.wikipedia.org/wiki/Toki_Pona ) Has only 323 objects in the OpenStreetMap : https://taginfo.openstreetmap.org/keys/name%3Atok#overview

The hard part is - creating a perfect map with 'Sitelen Pona' labels won't be easy due to the ~120 hieroglyphic characters.

Transliteration isn't straightforward, as many non-Latin scripts (Arabic, Chinese, Japanese, Hebrew, etc.) first need to be converted into Latin characters before the Toki Pona transliteration can work.

image

tbodt commented 1 month ago

Happily, transliteration to sitelen pona is not really necessary. Latin script is more commonly used for toki pona anyway, so I would be happy with that on the map. If toki pona with sitelen pona was an option it would be its own language code with its own labels defined, just like how there are multiple language codes for Japanese. Automatically transliterating everything is not really worth doing.

hyperknot commented 1 month ago

Thank you for the research @ImreSamu. I thought about this and would like to choose the following decision forward:

@tbodt if you submit a PR to planetiler to add individual languages and not all, then I'm happy to include "tok" in OpenFreeMap.

For your proposal of including all languages, why don't you convert your render to PMTiles and host that on Cloudflare? I mean, you can make an EveryLanguageMap or similar, I think it'd be a very interesting project!

tbodt commented 1 month ago

Ultimately I'm asking to add every language here instead of creating my own because it's much easier for you to do than for me, regarding the cost of serving and generation: a few % extra for you, an entire new project to maintain for me. I don't really understand why not. Yes, I'm here to make a Toki Pona map, but in the process it started to look like it would be just as easy to fix this for every language instead of just mine, and I like the idea of not leaving anyone out.

That said, the idea of adding individual languages suggests a good idea for designing the planetiler flags, I'll see what I can implement there.

hyperknot commented 1 month ago

I understand your point, but there are two big reasons why I think it't not a good idea for this project:

  1. Once we add something we have to support it. For example have a look at https://github.com/hyperknot/openfreemap/issues/24. What if we include those languages which no one used in OpenMapTiles before, and someone opens a ticket that language x is displayed incorrectly. We cannot just say that sorry, we don't care, we'd need to invest time into trying to solve that issue for that particular language.
  2. I believe the size growth is not 2% universally, but close to 0% on most of the world and up-to 10% in some dense areas. Making the map load 10% slower in popular areas is not a good idea.

Your map could be a perfect candidate for PMTiles + Cloudflare R2 hosting, you can literally host it for free. And about generation, you've just made a full planet run, if you don't want to update it frequently then you are basically done! I really mean you can set up your EveryLanguageMap in like a few hours and it'd be a super nice open source project.

tbodt commented 1 month ago

Thanks for the explanation. Indeed the cost is doing map updates - to keep the map up to date essentially requires scheduling and monitoring reruns forever.

I do plan to look at which tiles grow the most, will let you know what I find.

1ec5 commented 1 week ago

By the way, the OSMUS Tile Service (which powers OSM Americana, among other things) passes a very long list of languages into Planetiler (unfortunately not including tok, but feel free to open an issue about that). It would be great to not have to pass in language codes explicitly.