mapbox / mapbox-gl-js

Interactive, thoroughly customizable maps in the browser, powered by vector tiles and WebGL
https://docs.mapbox.com/mapbox-gl-js/
Other
11.16k stars 2.21k forks source link

Add support for rendering characters from unicode supplementary planes #4001

Open lucaswoj opened 7 years ago

lucaswoj commented 7 years ago

Migrated from https://github.com/mapbox/DEPRECATED-mapbox-gl/issues/29

@lucaswoj: We currently only support rendering characters from the Basic Multilingual Plane. We may need to support supplementary planes as we expand into markets that use non-Latin alphabets.

https://en.wikipedia.org/wiki/CJK_Unified_Ideographs#CJK_Unified_Ideographs_Extension_E notes that 98 of those characters come from “Chinese Academy of Surveying and Mapping ideographs”, so they probably will be cropping up in OSM eventually. -- @1ec5


@1ec5: I'm satisfied that the CJK Unified Ideographs Extension E characters will take awhile to make their way into OpenStreetMap, given that the block was only introduced to Unicode last year. However, note that the same work that goes into CJK E would also enable (colorless) emoji. Hopefully that’ll give this issue a bit more traction. 😉


@1ec5 My mistake: CJK E isn’t the only CJK block that’s in the Supplementary Ideographic Plane; CJK Unified Ideographs Extension B–D are also up there. Besides historical and Vietnamese characters, CJK B includes 1,702 characters from the Hong Kong Supplementary Character Set, which apparently includes a lot of Cantonese characters used in official Hong Kong place names.


@1ec5: As of Unicode 9.0, the following astral-plane blocks allow ideographic breaking:

  • Meroitic Hieroglyphs
  • Egyptian Hieroglyphs
  • Anatolian Hieroglyphs
  • Ideographic Symbols and Punctuation
  • Tangut
  • Tangut Components
  • Kana Supplement
  • Tai Xuan Jing Symbols
  • Counting Rod Numerals
  • Mahjong Tiles
  • Domino Tiles
  • Playing Cards
  • Enclosed Alphanumeric Supplement
  • Enclosed Ideographic Supplement
  • Miscellaneous Symbols and Pictographs
  • Emoticons
  • Ornamental Dingbats
  • Transport and Map Symbols
  • Alchemical Symbols
  • Geometric Shapes Extended
  • Supplemental Symbols and Pictographs
  • CJK Unified Ideographs Extension B
  • CJK Unified Ideographs Extension C
  • CJK Unified Ideographs Extension D
  • CJK Unified Ideographs Extension E
  • CJK Compatibility Ideographs Supplement

As of Unicode 9.0 and revision 16 of UTR #50, the following astral-plane blocks have upright vertical orientation:

  • Meroitic Hieroglyphs
  • Siddham
  • Egyptian Hieroglyphs
  • Anatolian Hieroglyphs
  • Ideographic Symbols and Punctuation
  • Tangut
  • Tangut Components
  • Kana Supplement
  • Byzantine Musical Symbols
  • Musical Symbols
  • Tai Xuan Jing Symbols
  • Counting Rod Numerals
  • Sutton SignWriting
  • Mahjong Tiles
  • Domino Tiles
  • Playing Cards
  • Enclosed Alphanumeric Supplement
  • Enclosed Ideographic Supplement
  • Miscellaneous Symbols and Pictographs
  • Emoticons
  • Ornamental Dingbats
  • Transport and Map Symbols
  • Alchemical Symbols
  • Geometric Shapes Extended
  • Supplemental Symbols and Pictographs
  • CJK Unified Ideographs Extension B
  • CJK Unified Ideographs Extension C
  • CJK Unified Ideographs Extension D
  • CJK Unified Ideographs Extension E
  • CJK Compatibility Ideographs Supplement

The following astral-plane blocks have neutral vertical orientation:

  • Supplementary Private Use Area-A
  • Supplementary Private Use Area-B
1ec5 commented 7 years ago

As of Unicode 10.0, the following astral-plane blocks also allow ideographic breaking:

Revision 17 of UTR #50 still reflects Unicode 9, but presumably the following astral-plane blocks also allow upright vertical orientation:

/cc @ChrisLoer

1ec5 commented 7 years ago

OpenStreetMap does have CJK Unified Ideographs B–F characters in a number of features’ name or name:zh tags, which wind up in the Mapbox Streets source’s {name} or {name_zh} fields, respectively:

There are also plenty of Cantonese place names are in name:zh-yue tags, but the Mapbox Streets source omits them because it lacks dedicated support for Cantonese.

GL JS skips over any supplementary-plane character, rather than leaving a space or replacement character. For example “卡司𥰆拉樂園” is rendered as “卡司拉樂園”, even with the demo in #4895. In principle, this could lead to some unfortunate labels.

/cc @ajashton @jcsg

1ec5 commented 5 years ago

The analysis in https://github.com/mapbox/mapbox-gl-js/issues/4001#issuecomment-312393003 mostly covered Chinese labels. Since then, Mapbox Streets has added support for Japanese names. Currently, OpenStreetMap has 11 features in Japan with unsupported characters in Japanese names: 3 buildings, 2 restaurants, 1 pond, 1 memorial, 1 shrine, and 1 supermarket.

1ec5 commented 5 years ago

As of Unicode 12.1, the following astral-plane blocks also allow ideographic breaking:

The following astral-plane blocks have upright vertical orientation: