notofonts / noto-cjk

Noto CJK fonts
http://www.google.com/get/noto/help/cjk
3.03k stars 220 forks source link

Missing 𛅦 #172

Open ctrlcctrlv opened 4 years ago

ctrlcctrlv commented 4 years ago

This makes it impossible to write Taiwanese kana...which is going to be more possible soon.

This is similar to googlefonts/noto-fonts#1185... I already submitted one proposal to remedy that, but there are more issues. Pamudpod doesn't merge in HarfBuzz because Unicode says it's not supposed to (it is). The alternate hollow kudlit have different phonetic values as well. So, I'll be submitting two proposals at once:

ctrlcctrlv commented 4 years ago

(BTW, Scripts Encoding Initiative agreed to help with both of these. Not financially, just advice-wise/pre-publication on unicode.org, so no advice is really needed per se, but of course I welcome all feedback.)

davelab6 commented 4 years ago

Pamudpod doesn't merge in HarfBuzz because Unicode says it's not supposed to (it is).

@behdad will hb wait for unicode to be updated?

behdad commented 4 years ago

@behdad will hb wait for unicode to be updated?

Not necessarily. What are we taking about? What's needed?

ctrlcctrlv commented 4 years ago

@behdad

Unencoded characters, such as U+170D, are rejected by Harfbuzz, for the purposes of combining marks and anchors, even if the font contains everything needed for such a combination to happen.

U+170D is in the pipeline, and without COVID-19, would have already been rolled out to Unicode 14.0 I believe, but has been delayed. Three other unencoded characters are likewise effected in re: Tagalog block.

behdad commented 4 years ago

Unencoded characters, such as U+170D, are rejected by Harfbuzz, for the purposes of combining marks and anchors, even if the font contains everything needed for such a combination to happen.

That's just not true.

First, make sure script itemizer and font selection code bundle the char with surrounding run. That's in HarfBuzz clients, so out of our scope.

Then, provide GDEF glyph class for it as mark and it should do combining and anchoring just like any other mark.

For it to work without GDEF glyph classes you need to update the Unicode data that HarfBuzz uses. So far we've stuck to released versions of Unicode. As mentioned above, you don't need this for most cases. The one case it's relevant is for HarfBuzz to understand that it should insert a dotted circle if the char appears at the beginning of the paragraph.

ctrlcctrlv commented 4 years ago

Hi @behdad Here's a MWE:

MWE.zip

Doesn't work in Chrome. GDEF's are defined.

Should appear as: Screen Shot 2020-07-12 at 12 09 28-fullpage

Appears as: Screenshot_2020-07-12_12-11-21

Thanks for looking into this.

davelab6 commented 4 years ago

@behdad should we file an issue anywhere else so someone else could take a look?

khaledhosny commented 4 years ago

HarfBuzz does not do script itemization, this is handled by the clients (Chrome in this case) as Behdad said. But I’m pretty sure all HarfBuzz clients (that do script itemization, any serious client does), do this based on published Unicode versions. If two characters end up in two different script runs, there is no way for HarfBuzz to apply OpenType features across these runs. You can verify this by testing with hb-view.

ctrlcctrlv commented 4 years ago

I think the question is more whether the undefined script should be put in a different run or not. You're all right that it's a Chrome issue, and I don't deny that latn and hebr ought to go into different runs, but I think characters of an undefined script (e.g. PUA, unencoded) should rightly go into the previous run as Firefox does.

khaledhosny commented 4 years ago

Unfortunately, there is currently no well-specified algorithm for doing script itemization, and each application comes with its own slightly incompatible method.

ctrlcctrlv commented 4 years ago

@davelab6 Is this type of issue something the TSWG should be tackling?

behdad commented 1 year ago

This should be reported to Chrome.