maplibre / maplibre-native

MapLibre Native - Interactive vector tile maps for iOS, Android and other platforms.
https://maplibre.org
BSD 2-Clause "Simplified" License
1.04k stars 300 forks source link

Support more writing systems using HarfBuzz #706

Closed wipfli closed 1 year ago

wipfli commented 1 year ago

MapLibre GL Native currently does not support writing systems like for example Indic scripts or Khmer.

I think we should support more writing systems such that people from all parts of the world can use MapLibre...

Let's use this as a tracking issue to collect ideas and material how we can extend writing system support. Maybe Harfbuzz will be the best tool.

wipfli commented 1 year ago

If only I knew which part of the code base was responsible for text rendering...

alanchenboy commented 1 year ago

GrabMap Khmer Render solution.pdf This is the slide how grabmap Implement harfbuzz + freetype.

brawer commented 1 year ago

Here’s my personal advice on this. (Disclaimer: While I’ve spent quite many years working on maps, internationalization, fonts, and text rendering, I’m a complete newbie to MapLibre. Please apologize for anything that doesn’t make sense in the MapLibre context).

Generally, GrabMap’s approach for Khmer goes in the right direction, but it’s incomplete, and you don’t need a separate hack for each language. Instead, once you’ve implemented text rendering correctly, a single code path can display all text in any language. This will also improve the display of Latin/English text in terms of kerning, ligatures, typographic features, and a big chunk of what’s needed for Emoji. Unfortunately, getting this right is fairly non-trivial, and also not exactly well documented. However, there’s several open-source implementations that might serve as inspiration. Personally, I’d mainly recommend looking at the source code of Chromium and Firefox; they both have state-of-the-art text rendering stacks. Another source of inspiration might be the Pango library used by GNOME/GTK and Qt, although with some reservations.

Here’s how modern text rendering works from a high level:

  1. Start by running the Unicode Bidirectional Algorithm. This results in a sequence of “bidi runs”. A good implementation is GNU FriBiDi, licensed under LGPL-2.1.
  2. Split the bidi runs into “script runs”, which are contiguous character sequences in the same writing system (and writing direction, as per step 1). This step is called “script itemization”, and there’s some subtleties. Some are documented in Annex 24, but this is not the full truth. There’s an implementation in the Win32 API, and one in Qt although one should double-check how good that is. It would be best to look at the Chromium and Firefox sources.
  3. Send each script run (with correct bidi and language tags) through HarfBuzz, using the default font for the chosen style. This step is called “shaping”, and results in “glyph vectors”, a sequence of (glyph ID, x, y).
  4. If the font hasn’t been able to display some pieces of text, the glyph vector will contain glyphs with glyph ID zero. In this case, find the failing substring, and recursively call HarfBuzz to shape the failing substring with a fallback font. Finding fallback fonts is non-trivial, performance-critical, and platform-specific. Again, have a look at the Chromium and Firefox sources.
  5. Break the labels into lines. Line-breaking (and hyphenation, which might be quite useful for maps) is non-trivial, language-specific, and performance-critical. Unicode defines an algorithm in Annex 14 but again, this isn’t the full story, so check out what web browsers do. It might be worth supporting soft hyphens, so a server-side tile renderer could use large, language-specific dictionaries (like those used by TeX) to find potential hyphenation points, and pass them through vector tiles to MapLibre.
  6. Render the glyphs. Given that MapLibre already has GPU rendering with Signed Distance Fields, GrabMap’s use of FreeType seemed a little surprising to me; there’s no obvious reason why SDF wouldn’t for for Khmer, Devanagari, or any other script in Unicode. However, if MapLibre wants to eventually support color fonts or color Emoji, there’ll be some complications here.

Language tags: For good text rendering, you’ll actually need to know the language of each label being rendered, and pass it down the rendering stack (into HarfBuzz) as an IETF BCP-47 language tag. This is the same language code that’s also used for language tagging in HTML, XML, and other data formats; modern browsers use it to tweak text rendering. Knowing the language mainly matters for East Asia, where certain glyphs should look slightly differently depending on the language (and region/country, which is part of IETF language tags). For example, this picture illustrates how the same Unicode codepoint U+8FD4 should look for various languages/regions. Knowing the language can also make a visible difference when rendering certain minority languages in South-East Asia such as Shan or Mon, and even for European languages like Polish, but these cases are admittedly rather high-end typography. Unfortunately, the MapBox Vector Tile format does not encode the language of labels. In the long term, I’d recommend extending the MVT format so that a tile renderer can encode the language. (In case extending MVT is too complicated: Unicode had once defined an escape mechanism that encodes language tags as special codepoints within a character stream. While Unicode has deprecated and strongly discouraged this escape mechanism, it might be still appropriate for MapLibre if the MVT format can’t be extended). For the medium term, I’d recommend implementing a client-side heuristics in MapLibre, at least for East Asia. That heuristic might also be useful when a tile doesn’t come with language tags, or when rendering other formats such as GeoJSON. In the short term, my recommendation would be to do nothing: Users will still be able to read the text, but they may complain about the “wrong font” being used. Knowing (or guessing) the language of each rendered label will also be needed if MapLibre ever wants to compute hyphenation points on the client side.

Color and variable fonts: Supporting color (and variable) fonts is a little complicated. But certainly doable, and I think both color and variations could have very nice applications for cartography. But it’s clearly less important than making text readable in the first place.

Web fonts: According to their slides, GrabMap seems to load Noto Sans Khmer over the web. However, on most modern devices, this wouldn’t actually be necessary; both Apple and Google bundle most of Noto with their operating systems. Although Apple hides the presence of Noto from its user interfaces, apps can still access the glyphs. Likewise, Microsoft Windows bundles a lot of international fonts. Said that, it certainly would make sense for MapLibre to support web fonts, both for custom styling and as a fallback when device fonts don’t cover the Unicode range needed for display. My recommendation would be to re-implement MapLibre’s text stack so it supports web fonts in the same way as Chromium and Firefox. However, this would likely be a sizeable chunk of work.

Font formats: Maybe I’m missing something here, but to me personally, the Mapbox font API seems a little weird. Again, my recommendation would be to make MapLibre behave like a modern web browser, support the same (standard) web font formats, and perform the conversion from Bézier curves to GPU-renderable Signed Distance Fields on the client device.

Styles: To define the style of map labels, my suggestion would be to implement Text and Fonts of CSS3, just like a modern web browser. This would be quite a bit of work, though.

brawer commented 1 year ago

Worth reading: Text layout is a loose hierarchy of segmentation. There’s been several attempts to bundle text layout into a single library, such as Raqm, Minikin, ICU Paragraph Layout, or Cobbletext. Personally, I’d recommend looking at them for inspiration, although I’m not sure if they really fit MapLibre. Another source to consider might be lib/ui/text in Flutter: this is a fork of Chrome’s text handling, and less entangled than Chrome, but it’s deeply integrated with Flutter so not directly usable for MapLibre. On the other hand, if you just want to fix rendering quickly with little work, and if you don’t care about line breaking, hyphenation, and rendering text on GPUs, Raqm might be a good solution. Minikin implements line breaking, but being part of Android, it would need to be ported to other platforms.

maxammann commented 1 year ago

Thanks for the great summary, I'll check it out later in more detail!

A thing I was wondering regarding the Glyph rendering aspect: Does maplibre really need resolution independent glyph rendering?

In Maplibre the glyphs "exist" within the 3D world. That means glyphs need to be resized depending on the zoom.

Wouldn't it be enought for the major usecases to render glyphs with a static resolution as an overlay?

brawer commented 1 year ago

Does maplibre really need resolution independent glyph rendering? In Maplibre the glyphs "exist" within the 3D world. That means glyphs need to be resized depending on the zoom. Wouldn't it be enought for the major usecases to render glyphs with a static resolution as an overlay?

Personally I’d find it a nice feature if text were able to grow and shrink with the rest of the map. Zooming would feel smoother that way, especially on deep zoom levels. But admittedly that’s pretty far off; the current user experience doesn’t seem to need this. Also, you can always rasterize glyphs on the CPU (by calling FreeType) to any desired resolution; it just won’t feel as smooth as when doing it on GPU.

maxammann commented 1 year ago

Personally I’d find it a nice feature if text were able to grow and shrink with the rest of the map. Zooming would feel smoother that way, especially on deep zoom levels. But admittedly that’s pretty far off; the current user experience doesn’t seem to need this. Also, you can always rasterize glyphs on the CPU (by calling FreeType) to any desired resolution; it just won’t feel as smooth as when doing it on GPU.

That is also my feeling. Yeah it would be cool to have resolution independant glphy rendering. But at the same time I'm wondering if we really need it. I honestly don't know right now why MapBox when that way originally. There must be some reason why it's neccassary.

ramSeraph commented 1 year ago

I just want to add one more library for consideration w.r.t client side text breaking - https://github.com/unicode-org/icu4x

This one is written especially for cases like maplibre.

Edit: I thought this was an issue in maplibre-gl-js. This might not be required for maplibre-gl-native.

wipfli commented 1 year ago

Unfortunately, the MapBox Vector Tile format does not encode the language of labels. In the long term, I’d recommend extending the MVT format so that a tile renderer can encode the language.

@brawer feel free to add this point to the discussion at https://github.com/nyurik/future-mvt/discussions/1

wipfli commented 1 year ago

We are not the first ones to think about supporting more writing systems. Here are some Mapbox issues:

wipfli commented 1 year ago

Almost 10 years ago, in the third issue created in the Mapbox GL JS repo, people talked about HarfBuzz https://github.com/mapbox/mapbox-gl-js/issues/3.

maxammann commented 1 year ago

Almost 10 years ago, in the third issue created in the Mapbox GL JS repo, people talked about HarfBuzz https://github.com/mapbox/mapbox-gl-js/issues/3.

Uuuh doesn't that issue suggest that they already used freetype/ICU lib?

1ec5 commented 1 year ago

Also lots of discussion in mapbox/DEPRECATED-mapbox-gl#4.

ramSeraph commented 1 year ago

My set of reading material/potential repos.. I hope it helps and is not an overload of data :)

EDIT: Cleaned up and recategorized

maxammann commented 1 year ago

@ramSeraph Thanks for that collection. Do you mind if I include that in https://maplibre.org/maplibre-rs/book/development-documents/font-rendering.html?

ramSeraph commented 1 year ago

@maxammann you can definitely include them.

I wasn't sure if I should put effort into maplibre-rs or here. I wasn't sure how far from production maplibre-rs was and I don't know rust( that can be remedied though :) )

I have to say, it was heartwarming to see the top priority of this issue at maplibre-rs - https://github.com/maplibre/maplibre-rs/issues/36#issue-1213681716

Is there a place where I can add more of the rust text util research? ( I can see that you already have looked at a few things I have )

ramSeraph commented 1 year ago

Unfortunately, the MapBox Vector Tile format does not encode the language of labels. In the long term, I’d recommend extending the MVT format so that a tile renderer can encode the language.

@brawer feel free to add this point to the discussion at nyurik/future-mvt#1

I wonder if this should be part of the maplibre style spec or MVT specification.

Also, please consider dropping the glyph api from the maplibre style spec if possible. The alternative mentioned in this issue tracker seems like a good idea.

https://github.com/maptiler/tileserver-gl/issues/641#issuecomment-1313475428