mapbox / mapbox-gl-js

Interactive, thoroughly customizable maps in the browser, powered by vector tiles and WebGL
https://docs.mapbox.com/mapbox-gl-js/
Other
11.23k stars 2.23k forks source link

text-transform is not locale-aware #3999

Open lucaswoj opened 7 years ago

lucaswoj commented 7 years ago

Migrated from https://github.com/mapbox/DEPRECATED-mapbox-gl/issues/21 by @1ec5

The text-transform property is documented in the style specification as being “similar to the CSS text-transform property”. One key difference is that most modern browser engines transform a text node based on the node’s declared or inherited locale, via the lang HTML attribute or xml:lang XML attribute, taking into account any language-specific case rules. By contrast, Mapbox GL implementations perform a locale-neutral transformation (for example the “C locale” on POSIX platforms).

A locale-neutral transformation works well for many alphabets, such as English and Spanish, and as expected it has no effect on ideographic writing systems such as the CJK scripts. However, many Latin alphabets have special cases that the C locale doesn’t respect. For example, the Turkish city Kırşehir, whose name includes both dotted and dotless I’s, should be labeled “KIRŞEHİR” but instead is labeled “KIRŞEHIR” (omitting a tittle):

Kırşehir

German street names should be labeled, e.g., “GROSSER STERN” instead of “GROßER STERN”:

Großer Stern

It isn’t sufficient to transform text to the user’s current locale. The examples above come from applying "text-field": "{name}" in the Bright style. The name field in the Mapbox Streets source is written in each feature’s native language, but it provides no way to distinguish between different languages. One could imagine a future version of the source providing a best guess of the name’s language, expressed as a BCP 47 / ISO 639 tag, based on the containing country and some character range–based heuristics. The style specification, then, could be extended with a text-language property that would be set to {language} for any layer that sets text-field to {name}.

Adding a text-language property isn’t semantically ideal, since it’s really the data that has an intrinsic language, not the style. But it seems like overkill to extend the vector tile specification with a new type that pairs a string with a language identifier.

The native platforms supported by Mapbox GL have standard APIs for uppercasing or lowercasing a string based on a locale. For example, the Mapbox iOS/macOS SDK implementation of "text-transform": "uppercase" calls -[NSString uppercaseString], but it should call -[NSString uppercaseStringWithLocale:] instead.

On the other hand, Mapbox GL JS calls String.prototype.toUpperCase(), and there is currently no standard API for locale-aware conversions beyond the user’s current locale. From the discussion at https://github.com/mapbox/mapbox-gl-js/issues/149#issuecomment-45789708, it sounds like it’d be impractical to include a JavaScript library for pan-language support. However, maybe there’s room to support a handful of high-priority languages like German and Turkish.

The specification should make it clear that locale awareness is made on a best-effort basis, just like in CSS. For example, the Mapbox iOS and macOS SDKs won’t necessarily uppercase the English “E MacDonald St” as “E MacDONALD ST”, the Mapbox Android SDK may fall back to the C locale for Klingon, and Mapbox GL JS wouldn’t be required to do anything differently than it already does.

Beyond text transformations, Mapbox GL could in the future use the text-language property to choose the correct national language variant for each Unihan character in CJK text, just as native text rendering engines and Web browser rendering engines do.

cc @mapbox/gl @mapbox/cartography-cats @kkaefer @jfirebaugh

1ec5 commented 7 years ago

From https://github.com/mapbox/DEPRECATED-mapbox-gl/issues/21#issuecomment-261338156:

There are two proposals so far in this ticket:

  1. A text-language layout property that accepts tokens just as text-field would; text-language would affect at least text transforms but potentially also font fallbacks in the future. If the designer sets text-field to {name_tr}, then they can set text-language to tr. But if they want to set text-field to {name}, they’d need Mapbox Streets to provide a name_language property on each individual feature.
  2. Alternatively, a mapping – somewhere, maybe in the style JSON, maybe in TileJSON – from vector tile properties to language codes that Mapbox GL would consult any time it tries to transform text that originates in one of these vector tile properties. So if text-field is {name_de} — {name_en} and text-transform is uppercase, then Mapbox GL would know to uppercase the name_de value with the German locale before inserting it into the overall string. Indicating the language of the name field per-feature would be out of scope.

It’s entirely possible that both proposals are rubegoldbergian and there are simpler ways to accomplish locale-aware text transforms. Any ideas?

ajashton commented 7 years ago

This doesn't solve the dotted I problem but would't upper('ß')'SS' always be a safe/reasonable thing to do in the context of rendering text on a map? Could that be the default locale-less behavior?

1ec5 commented 7 years ago

That is the behavior when the locale goes unspecified on some platforms, including iOS and macOS, but apparently not including the Web.

kkaefer commented 7 years ago

@ajashton in the particular case of ß => SS, you are mostly right (there are fonts which support an uppercase ß () but it's not in wide use.