Closed devosb closed 3 years ago
If I swap the order of the vowel sign and KEMPHRENG (option 3 in the comment above) the rendering does not show any dotted circles with the same environments as above.
File harfbuzz-noto.png
:
File directwrite-noto.png
:
I forgot to add the table referenced in the original post
USV | Name (LIMBU …) | UGC | UISC | UIPC | Sigla | Problem |
---|---|---|---|---|---|---|
1920 | VOWEL SIGN A | Mn | Vowel_Dependent | Top | VAbv | |
1921 | VOWEL SIGN I | Mn | Vowel_Dependent | Top | VAbv | |
1922 | VOWEL SIGN U | Mn | Vowel_Dependent | Bottom | VBlw | HB DW |
1923 | VOWEL SIGN EE | Mc | Vowel_Dependent | Right | VPst | HB DW |
1924 | VOWEL SIGN AI | Mc | Vowel_Dependent | Right | VPst | HB DW |
1925 | VOWEL SIGN OO | Mc | Vowel_Dependent | Top_And_Right | VAbv | DW |
1926 | VOWEL SIGN AU | Mc | Vowel_Dependent | Top_And_Right | VAbv | DW |
1927 | VOWEL SIGN E | Mn | Vowel_Dependent | Top | VAbv | |
1928 | VOWEL SIGN O | Mn | Vowel_Dependent | Top | VAbv | |
1939 | SIGN MUKPHRENG | Mn | Consonant_Final | Bottom | FBlw | |
193A | SIGN KEMPHRENG | Mn | Vowel_Dependent[Tone_Mark] | Top | VAbv[VMAbv] | |
193B | SIGN SA-I | Mn | Syllable_Modifier | Bottom | FM (DW)FMBlw (HB) |
Re option 3: This option could certainly be implemented in a smart keyboard. Do you see a major problem with it? Is there a lot of digital text already that has kemphreng after VBlw or VPst? Or a standard that requires that order?
CC @xadxura @behdad
I don't see a major problem with using a smart keyboard. A minor one though. It makes the keyboard more complicated, especially if a user is editing text with two marks (vowel and kemphreng) and wishes to delete one mark, which mark gets deleted?
There is a lot of digital text with the kemphreng after VBlw or VPst, but most (if not all) of that text was derived from Devanagari using a conversion table.
I don't know of a standard that requires that order. Linguistically it is the most straightforward, and therefore I would guess easier for any user to understand how to type (with a non-smart keyboard).
The choice here is whether we recategorise U+193A from VAbv to VMAbv. Linguistically, functionally, typing orderly, it is a VMAbv. That it is a VAbv is really a bug. The question is whether changing it is too costly and whether there is a willingness to change.
We know of no other Unicode data in Limbu beyond the text we have auto converted from Devanagari script. So redoing that to reorder is not a problem. The technical change within the USE is trivial and alleviates the need for smarts in the keyboard. But the keyboard smarts aren't huge either if your keyboard has the general capability.
In terms of UX with U+193A being left as VAbv; for the most part a smart keyboard can fake the reordering (the user always types it after a vowel, as they are used to). But there are some rough edges to that. For example, dropping a cursor and hitting backspace will almost certainly delete the VPst (or more confusingly the VBlw) before deleting the U+193A when the user would expect the opposite experience.
If there is a willingness in the community to make the change, then we could write a Unicode proposal to change the category. The question is how long it would take to roll out the change, if it is agreed.
In my original post my description of auto-length.txt as (U+190B|U+190F)(VOWEL SIGN x)(U+193A) is a bit incorrect. There are 9 vowels, but ten clusters in each line. The extra cluster (at the beginning of the line) does not have a vowel sign, just a consonant and the U+193A.
In a later post I swapped the order of the vowel sign and the U+193A. Since there was no vowel sign in the first cluster in the original post to swap, the example would have been the same, so I omitted the first cluster in the second test.
Sounds obvious candidate for change to me. We can get this in HarfBuzz very quickly. Definitely right after UTC approving and possibly earlier. If there's no huge Android userbase for it I don't see a problem since chrome and Firefox would update rather quickly.
Fixed in both HB (tested with 2.8.0) and DW (tested with the following version of Windows)
Edition Windows 10 Pro Insider Preview
Version Dev
Installed on 2021-03-15
OS build 21332.1010
Experience Windows Feature Experience Pack 120.2212.551.0
I think DW made the fix first in https://github.com/microsoft/font-tools/commit/e8e20f5980c40761dd06c1479fe459cd403d6cfa and HB imported this fix for itself in https://github.com/harfbuzz/harfbuzz/commit/06f49fc8ae40f083758e1ca8e9bd9879549d8c39
Thanks to @xadxura and @dscorbett for doing this.
Font
NotoSansLimbu-Regular.ttf
Where the font came from, and when
Site: https://github.com/googlefonts/noto-fonts/blob/master/phaseIII_only/unhinted/ttf/NotoSansLimbu/NotoSansLimbu-Regular.ttf Date: 2019-12-02
Font Version
2.000
OS name and version
Ubuntu Bionic amd64 Windows 10 Pro 1909
Application name and version
On Ubuntu - Libre Office Writer 6.0.7 On Windows - Notepad
Issue
Using U+193A LIMBU SIGN KEMPHRENG as a length mark after a vowel sign causes dotted circles with some vowel signs.
auto-length.txt
format with font in the above applications. auto-length.txt The file has space separated clusters of (U+190B|U+190F)(VOWEL SIGN x)(U+193A). I suspect the issue would happen with any consonant.harfbuzz-noto.png
from LibreOffice Writer on Ubuntu Filedirectwrite-noto.png
from Notepad on Windows. Note that DirectWrite produces more dotted circles than HarfBuzz.harfbuzz-namdhinggo.png
This last example was created with the Namdhinggo font withlimb
script lookups removed, leaving onlylatn
script lookups. I suspect the results would be the same if thelatn
lookups were replaced withDFLT
script. This way, the USE does not get invoked on Ubuntu LibreOffice Writer. On Widows, the USE would have been called, and dotted circles produced.If I use
hb-view
, I can have lookups for bothlimb
andlatn
script, and by passing--script
tohb-view
I can reproduce the dotted circles withlimb
and the desired result (no dotted circles) withlatn
.If U+193A is classified with UISC = Tone_Mark, this would cause the Sigla to become VMAbv (VOWEL_MOD_ABOVE) and since all vowel modifiers come after vowels, then the cluster validation will pass. Changing the HB source code (locally) to use VMAbv instead of VAbv fixes the issue.
So, can the Noto font be modified to have a lookup that removes the dotted circle that USE inserts? Other options are:
Personally, I suspect option 1 or 2 would be best, but that is just my opinion.
Character data
Attached above.
Screenshot
Attached above.