Open wipfli opened 9 months ago
The next option is to use all codepoints starting from U+0000. I think however that this is a bad idea because MapLibre has some special behavior for white spaces and also for Latin letters. For example, to-uppercase
can be used with Latin letters in MapLibre.
Finally, I think the actual best option is to use the codepoints which MapLibre thinks belong to script that are not supported, i.e., the ones that are not in is-supported-script
of the style spec. MapLibre GL JS defines these codepoints with a heuristic here:
if ((char >= 0x0900 && char <= 0x0DFF) ||
// Main blocks for Indic scripts and Sinhala
(char >= 0x0F00 && char <= 0x109F) ||
// Main blocks for Tibetan and Myanmar
isChar['Khmer'](char)) {
// These blocks cover common scripts that require
// complex text shaping, based on unicode script metadata:
// http://www.unicode.org/repos/cldr/trunk/common/properties/scriptMetadata.txt
// where "Web Rank <= 32" "Shaping Required = YES"
return false;
}
Here we have
num_codepoints = (0x0DFF - 0x0900 + 1) + (0x109F - 0x0F00 + 1) + (0x17FF - 0x1780 + 1)
1824 codepoints in total. This is enough to cover all of Devanagari in one MapLibre font.
Is there any small change we can make to maplibre to allow the use of the CJK BMP range here?
It would be ideal if most unsupported scripts could use a single font stack, instead of needing a separate font stack for every script.
I thought about CJK too, but I don't see how that could work. From the point of view of MapLibre, how should the renderer know if a codepoint in a CJK range should be rasterized locally (default behavior) or if it should fetch the glyph from the server?
The other thing I was thinking is that we might be able to use codepoints outside of the basic multilingual plane for this, i.e., >= 2**16...
It looks like MapLibre GL JS accepts text in the PUA.
We need to map the 843 unique positioned glyphs for Devanagari found in #4 to some Unicode ranges. The question is which ranges are available for this?
In principle, we could use the Private Use Area (PUA) for this purpose. See https://en.wikipedia.org/wiki/Private_Use_Areas. It contains 6400 codepoints in the Basic Multilingual Plane, i.e., the codepoints that are smaller than 2**16. However, MapLibre GL JS already uses the PUA for storing images embedded in text. See shaping.ts. MapLibre fills up the PUA codepoints from the low to high so we could fill them up high to low but feels a bit dangerous.
Next, we could just use the codepoints that unicode anyway reserves for Devanagari. There are 3 ranges:
So that is a total of 170 codepoints available in Devanagari Unicode ranges which is not enough to cover all positioned glyphs.
Update: tagged_string.cpp in MapLibre Native seems to do the same as MapLibre GL JS, i.e., images in strings are referenced to by codepoints in the PUA.