osm-americana / openstreetmap-americana

A quintessentially American map style
https://americanamap.org
Creative Commons Zero v1.0 Universal
183 stars 60 forks source link

Place names are unavailable in many relevant languages #586

Open 1ec5 opened 1 year ago

1ec5 commented 1 year ago

This style uses vector tiles that contain place names in many languages, but the selection of languages leaves some room for improvement. For a style that’s focused on the Americas, particularly the United States, language support is nevertheless dominated by European languages:

Languages represented in United States place POI generated by OpenMapTiles/Planetiler * Albanian (sq) * Amharic (am) * Arabic (ar) * Armenian (hy) * Azerbaijani (az) * Basque (eu) * Belarusian (be) * Bosnian (bs) * Breton (br) * Bulgarian (bg) * Catalan (ca) * Chinese (zh) * Corsican (co) * Croatian (hr) * Czech (cs) * Danish (da) * Dutch (nl) * English (en) * Esperanto (eo) * Estonian (et) * Finnish (fi) * French (fr) * Georgian (ka) * German (de) * Greek (el) * Hebrew (he) * Hindi (hi) * Hungarian (hu) * Icelandic (is) * Indonesian (id) * Irish (ga) * Italian (it) * Japanese (ja) * Kannada (kn) * Kazakh (kk) * Korean (ko) * Kurdish (ku) * Latin (la) * Latvian (lv) * Lithuanian (lt) * Luxembourgish (lb) * Macedonian (mk) * Malayalam (ml) * Maltese (mt) * Norwegian (no) * Occitan (oc) * Polish (pl) * Portuguese (pt) * Romanian (ro) * Romansh (rm) * Russian (ru) * Scottish Gaelic (gd) * Serbian (sr) * Slovak (sk) * Slovenian (sl) * Spanish (es) * Swedish (sv) * Tamil (ta) * Telugu (te) * Thai (th) * Turkish (tr) * Ukrainian (uk) * Welsh (cy) * Western Frisian (fy)

This list excludes 11 of the 30 most spoken languages in the United States – even the fourth and fifth most spoken, Tagalog and Vietnamese. Of the many languages that have official status within the U.S. at the state/territorial level, only English and Spanish are supported. Support for explicitly tagged minority languages could be important for this style in the future. Natural features often have notable names in indigenous languages. There are many communities across the country that have points of interest and even streets primarily in an immigrant language. This project would be able to “Challenge the status quo” of raster maps more powerfully if it could expose a wider variety of languages.

The names themselves come from a combination of Wikidata and OpenStreetMap, but the decision about which languages to expose is made by by OpenMapTiles. Historically, adding new language-qualified name fields has required a potentially painful tradeoff in tile size. However, nowadays it should be quite feasible to add new name fields only when tagged in OSM, relying on the client to fall back to another field when a given language is unavailable. Every GL JS–compatible renderer in the last several years has had support for the coalesce expression operator. #578 demonstrates the effective use of this expression operator in any style.

ZeLonewolf commented 1 year ago

Let's generate a list of the 11 missing languages so we can file a corresponding issue with OpenMapTiles.

1ec5 commented 1 year ago

Here are the gaps among the 35 most spoken languages in the U.S. Please correct me if I’ve flubbed anything:

  1. Chinese (zh) – supported, but not very usable without a distinction between zh-Hans and zh-Hant and/or between zh-CN, zh-HK, and zh-TW and/or between cmn and yue
  2. Tagalog (tl), Filipino (fil)
  3. Vietnamese (vi)
  4. Haitian Creole (ht)
  5. Yiddish (yi)
  6. Persian (fa), Tajik (tg)
  7. Gujarati (gu)
  8. Bengali (bn)
  9. Lao (lo)
  10. Urdu (ur)
  11. Punjabi (pa, pnb)
  12. Hmong (hmn)
  13. Swahili (sw)
  14. Khmer (km)
  15. Navajo (nv)

Note that GL JS also lacks support for some of these languages’ writing systems, though this is also true of some of the languages OpenMapTiles already supports, such as Hindi.

Here are the gaps among the languages with official status in U.S. states and territories:

I think the broader point about this long list is that, these days, it’s kind of antiquated for a tileset to limit itself to a fixed set of name fields. Ideally, a tileset would just dump whatever name fields are in OSM for a given feature, substituting the corresponding Wikidata label where available. Backwards compatibility would be the only reason for limiting the languages, but I don’t know of any client that depends on tiles to contain only a fixed set of fields.

1ec5 commented 1 year ago

The Planetiler-based tiles Americana uses by default now include many more languages, though some of the indigenous languages in https://github.com/ZeLonewolf/openstreetmap-americana/issues/586#issuecomment-1328378412 are still missing.