Open ctrlcctrlv opened 4 years ago
(BTW, Scripts Encoding Initiative agreed to help with both of these. Not financially, just advice-wise/pre-publication on unicode.org, so no advice is really needed per se, but of course I welcome all feedback.)
Pamudpod doesn't merge in HarfBuzz because Unicode says it's not supposed to (it is).
@behdad will hb wait for unicode to be updated?
@behdad will hb wait for unicode to be updated?
Not necessarily. What are we taking about? What's needed?
@behdad
Unencoded characters, such as U+170D, are rejected by Harfbuzz, for the purposes of combining marks and anchors, even if the font contains everything needed for such a combination to happen.
U+170D is in the pipeline, and without COVID-19, would have already been rolled out to Unicode 14.0 I believe, but has been delayed. Three other unencoded characters are likewise effected in re: Tagalog block.
Unencoded characters, such as U+170D, are rejected by Harfbuzz, for the purposes of combining marks and anchors, even if the font contains everything needed for such a combination to happen.
That's just not true.
First, make sure script itemizer and font selection code bundle the char with surrounding run. That's in HarfBuzz clients, so out of our scope.
Then, provide GDEF glyph class for it as mark and it should do combining and anchoring just like any other mark.
For it to work without GDEF glyph classes you need to update the Unicode data that HarfBuzz uses. So far we've stuck to released versions of Unicode. As mentioned above, you don't need this for most cases. The one case it's relevant is for HarfBuzz to understand that it should insert a dotted circle if the char appears at the beginning of the paragraph.
Hi @behdad Here's a MWE:
Doesn't work in Chrome. GDEF's are defined.
Should appear as:
Appears as:
Thanks for looking into this.
@behdad should we file an issue anywhere else so someone else could take a look?
HarfBuzz does not do script itemization, this is handled by the clients (Chrome in this case) as Behdad said. But I’m pretty sure all HarfBuzz clients (that do script itemization, any serious client does), do this based on published Unicode versions. If two characters end up in two different script runs, there is no way for HarfBuzz to apply OpenType features across these runs. You can verify this by testing with hb-view
.
I think the question is more whether the undefined script should be put in a different run or not. You're all right that it's a Chrome issue, and I don't deny that latn
and hebr
ought to go into different runs, but I think characters of an undefined script (e.g. PUA, unencoded) should rightly go into the previous run as Firefox does.
Unfortunately, there is currently no well-specified algorithm for doing script itemization, and each application comes with its own slightly incompatible method.
@davelab6 Is this type of issue something the TSWG should be tackling?
This should be reported to Chrome.
This makes it impossible to write Taiwanese kana...which is going to be more possible soon.
This is similar to googlefonts/noto-fonts#1185... I already submitted one proposal to remedy that, but there are more issues. Pamudpod doesn't merge in HarfBuzz because Unicode says it's not supposed to (it is). The alternate hollow kudlit have different phonetic values as well. So, I'll be submitting two proposals at once: