mapbox / mapbox-gl-js

Interactive, thoroughly customizable maps in the browser, powered by vector tiles and WebGL
https://docs.mapbox.com/mapbox-gl-js/
Other
11.08k stars 2.21k forks source link

Add expression operators for locale matching (system languages) #6197

Open 1ec5 opened 6 years ago

1ec5 commented 6 years ago

There should be a simple way for the style author to specify that a text-field should be set to the name_* feature property that best fits the system’s preferred languages. Secondarily, it would be great if the most appropriate locale could be used on its own in expressions.

Motivation

Localizing a style’s labels currently entails iterating over all the layers, manually replacing references to name_* feature properties within each text-field value. If these values are expressions, replacing the references can be an involved, recursive step. The iOS and macOS map SDKs have a built-in option, MGLStyle.localizesLabels, that applies these changes automatically based on the system language and region preferences. There’s a plugin for GL JS and a forthcoming plugin for the Android map SDK (mapbox/mapbox-plugins-android#74) that do likewise.

While this approach is effective, it operates at such a high level that the localizing code doesn’t have a good way to reason about the style author’s intentions. Should {name} ({name_en}) be replaced by {name_es} ({name_en}) or just {name}? The style’s author has no opportunity to react to changes that could radically alter the style’s appearance, for instance by increasing the font size when the system language is Chinese. Moreover, the localization feature implicitly opts the map into runtime styling–specific behaviors like disabling automatic style refreshes.

Design

The style specification would be extended with two expression operators:

For the purposes of these operators, a locale identifier could include a language code, script code, or region code, or some combination thereof. I would be in favor of specifying BCP 47 as the locale identifier standard to follow.

In typical usage, a style author would opt into localization by setting text-field to a value such as:

[
  "let",
  "streets-languages", ["ar", "de", "en", "es", "fr", "pt", "ru", "zh", "zh-Hans"],
  [
    "coalesce",
    ["concat", "name_", ["match-locales", ["user-locales"], ["var", "streets-languages"]]],
    "name"
  ]
]

Meanwhile, ["at", 0, ["user-locales"]] could be used on its own as part of a number formatting operator (#4119) and a case- and diacritic-folding string comparison operator (#4136).

Design alternatives

It’s unfortunate that streets-languages would have to be hard-coded and duplicated on every symbol layer. However, I don’t see a good way around that unless the vector tile source formally declares its language-specific name fields (perhaps via mapbox/tilejson-spec#14) or we encapsulate that array in a third expression operator, mapbox-streets-languages.

It might be tempting to rely on match as an alternative to match-locales; however, locale identifier matching rules are rather complicated. For example, for the set of languages supported by the Streets source, en-US should resolve to en, zh-TW should resolve to zh, and zh-Hans-TW should resolve to zh-Hans.

Implementation

/ref https://github.com/mapbox/mapbox-gl-native/issues/10713#issuecomment-366382036 /cc @mapbox/gl-core @fabian-guerra @tobrun @langsmith @nickidlugash @bsudekum

langsmith commented 6 years ago

fyi @cammace ☝️

ChrisLoer commented 6 years ago

I'm toying with the idea of implementing this. I think the design makes sense, although the symbol-layer-verbosity problem is annoying.

As I mentioned in https://github.com/mapbox/mapbox-gl-js/pull/6270#issuecomment-375817855, I wonder if BCP 47 gives us more information than we want/need. If we restrict locale specifications to ISO 639-1 codes, we probably don't even need locale-utils (saving code size, but more importantly semi-hidden complexity), and we have a simpler input to platform-specific APIs that may not speak BCP 47. On the other hand, we'd give up being able to choose number formatters based on country...

1ec5 commented 6 years ago

There’s already a need for more than ISO 639-1: many languages only have ISO 639-2 codes, not ISO 639-1 codes, and a few major languages like Chinese often need to be qualified by an ISO 15924 script code or ISO 3166 country code, such as for label localization. For example, the Mapbox Streets source distinguishes between zh and zh-Hans, leaving open the possibility of distinguishing zh-Hant in the future.

ChrisLoer commented 6 years ago

@1ec5 🤔 How about two arguments, language + (optional) region:

This would not support script customization (e.g. Hans vs Hant and wow I just realized the s in hans was for "simplified"), or the variant options in BCP 47. Again, the motivation is maximum cross-platform compatibility:

1ec5 commented 6 years ago

Separating the language and region into two arguments gives us less flexibility to support more locale information (such as script codes) in the future. I think it would be more forward-compatible if each locale-aware operator accepts a single locale code argument; each operator would decide for itself how specific a code it would honor. For example, locale matching needs to respect script differences, but perhaps string comparison does not.

1ec5 commented 6 years ago

It’s unfortunate that streets-languages would have to be hard-coded and duplicated on every symbol layer. However, I don’t see a good way around that unless the vector tile source formally declares its language-specific name fields (perhaps via mapbox/tilejson-spec#14) or we encapsulate that array in a third expression operator, mapbox-streets-languages.

As of mapbox/tilejson-spec#42, TileJSON 3.0 will formally declare a vector_layers property that enumerates the layers and their fields. While the specification doesn’t provide a way to explicitly state the language of each field, I think it would be fine to assume name_* fields are of the form name_{ISO 639}, which would be no less robust than hard-coding language fields in the style or SDK.

andrewharvey commented 6 years ago

While I'm overall very positive about this change, it should still support user overrides to the locale. eg. My browser might be set to English, but I want to build in a button on my site that will swap the map to German, regardless of my browser setting.

1ec5 commented 6 years ago

My browser might be set to English, but I want to build in a button on my site that will swap the map to German, regardless of my browser setting.

That could be implemented via an API such as setLabelLanguage().

1ec5 commented 4 years ago

The style’s author has no opportunity to react to changes that could radically alter the style’s appearance, for instance by increasing the font size when the system language is Chinese.

It’s unfortunate that streets-languages would have to be hard-coded and duplicated on every symbol layer. However, I don’t see a good way around that unless the vector tile source formally declares its language-specific name fields (perhaps via mapbox/tilejson-spec#14) or we encapsulate that array in a third expression operator, mapbox-streets-languages.

Per mapbox/mapbox-gl-native#15659 and https://github.com/mapbox/mapbox-gl-native/issues/14470#issuecomment-489216407, knowing the language contained in each layer of the Streets source would allow GL to choose the appropriate font for a given character without forcing the developer to specify font overrides. The locale matching proposed here would help to associate that information with the fonts specified in the stylesheet.