notofonts / noto-cjk

Noto CJK fonts
http://www.google.com/get/noto/help/cjk
2.94k stars 215 forks source link

GB18030-2022 amendment 1 (early 2023) update - font development schedule #252

Open chrissimpkins opened 1 year ago

chrissimpkins commented 1 year ago

The GB18030-2022 standard revision was amended further in early 2023. Compliance with the amendment requires font development in the Noto Sans and Noto Serif CJK families. The work is being performed by the Adobe team and they tentatively plan to deliver updated fonts by the end of August 2023. I am opening this issue to track and update those with an interest in this issue on the status of the font development.

cc @simoncozens @punchcutter @davelab6


Sept 2023 update: https://github.com/notofonts/noto-cjk/issues/252#issuecomment-1721238359

punchcutter commented 1 year ago

GB18030-2022 is supposed to go into effect on August 1, 2023 so I plan to have updated fonts before that. There has been talk of the date changing due to the amendments, but so far nothing is official.

sunnycrown83 commented 1 year ago

Will this update include support to level 3?

punchcutter commented 1 year ago

@sunnycrown83 Only Level 2.

sunnycrown83 commented 1 year ago

Thanks for quick reply. Based on this article, hasn't noto font already complied with level 2? https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132

punchcutter commented 1 year ago

Sans is ready for the original Level 2, but CESI added an amendment this year which adds more characters so with the amendment Sans needs more. Serif wasn't ready for the original Level 2 or amended Level 2, but it will be soon.

medicalwei commented 1 year ago

I would like to confirm, that to complete this, we need the following 26 glyphs from the basic block and Ext. A block for the Amendment 1, which is to comply with new Level 1, and there is no additions to L2 support in the amendment?

CJK统一汉字新增的16个汉字
GB 18030-2022    Unicode    Display
--------------------------------------------------
82359637    U+9FF0    鿰
82359638    U+9FF1    鿱
82359639    U+9FF2    鿲
82359730    U+9FF3    鿳
82359731    U+9FF4    鿴
82359732    U+9FF5    鿵
82359733    U+9FF6    鿶
82359734    U+9FF7    鿷
82359735    U+9FF8    鿸
82359736    U+9FF9    鿹
82359737    U+9FFA    鿺
82359738    U+9FFB    鿻
82359739    U+9FFC    鿼
82359830    U+9FFD    鿽
82359831    U+9FFE    鿾
82359832    U+9FFF    鿿

CJK统一汉字扩充A的10个汉字
GB 18030-2022    Unicode    Display
--------------------------------------------------
82358739    U+4DB6    䶶
82358830    U+4DB7    䶷
82358831    U+4DB8    䶸
82358832    U+4DB9    䶹
82358833    U+4DBA    䶺
82358834    U+4DBB    䶻
82358835    U+4DBC    䶼
82358836    U+4DBD    䶽
82358837    U+4DBE    䶾
82358838    U+4DBF    䶿
punchcutter commented 1 year ago

@medicalwei 26 glyphs for Sans, but Serif needs 5 more: U+9FEB, U+9FEC, U+9FED, U+9FEE, U+9FEF

Marcus98T commented 1 year ago

I shall apologize in advance for annoying and frustrating you over the past three months at the Source Han Sans issue page with loads of (mostly minor) glyph-related "issues" and requests. But I want to say this here, one more time.

Besides updating the GB 18030 character set, I am hoping that some of the glyph issues (that occasional/rare characters are showing JP or TW/HK forms as the only glyph shape for CN) can be resolved.

You know already that I am sincerely hoping to at least increase the CN glyph coverage as much as possible by consolidating more glyphs, and restoring some v1 CN/JP glyphs.

But even if I had faith that Adobe did their best, I think they still won't get 100% CN coverage (given that most are GB Extension sources), so I already requested that we put in the readme a known issue that not all GB 18030 characters will adhere to the GB standard, as a likely outcome.

Thank you for all your hard work put into this series of fonts, and we are hoping to see a new release soon.

mrhpearson commented 1 year ago

Hi @punchcutter - I became aware of this recently as the new rule impacts the Lenovo Linux preloads for China. Do let me know if there is anything we can help with (realistically probably only on the testing side, but open to suggestions and I have team members in China who can jump in and understand what is being looked at if needed). I assume at this point we're just waiting for the updated font from Adobe? If you have any updates or ETA please could you share. Thanks!

Marcus98T commented 1 year ago

All I can say is as of 16 August 2023, Adobe is still overwhelmed with plenty of existing issues that a release is not certain now. I think the issues are so huge in scope that they could not be easily resolved in time for the GB 18030 enforcement date of 1 August 2023. But they are still trying to resolve as much technical bugs and glyph issues as possible (and I don't know whether some of them in the latter can be resolved). Rest assured, a release is still around the corner, but unfortunately we just have to keep waiting.

I think it's going to be a major release.

punchcutter commented 1 year ago

@Marcus98T It will be the most minor release the project has ever had.

mrhpearson commented 1 year ago

Thanks for the updates - I think it's safe to say hitting 1 August will be tough unless they have a time machine :) Do let me know when you have an update on delivery, even if it's just ballpark, and if there's anything sensible we can do to help.

Marcus98T commented 1 year ago

@Marcus98T It will be the most minor release the project has ever had.

Most minor release? I kinda assumed that because there’s virtually no room for adding the 26 glyphs required for GB18030-2022 support in Sans v2.004 (there’s only four blank glyph slots when I checked in Glyphs), consolidation and removal of existing glyphs must happen in order to accommodate the new glyphs required, and that to me will still be deemed a major release because there will be a CID assignment change.

Anyway, if I take @punchcutter's word for it, it may be the first time that there will be a minor release with CID changes. Then I think it will be a v2.500 release.

I think in terms of version relativity, a major release had Arphic designing about 1,750 new HK glyphs required for HK support in v2.000, along with improving the 辶 (TW/HK only) and 廴 radicals, which is definitely a huge amount of work. I think basically put, such a "minor" release will only include removing some unnecessary duplicate glyphs, adjusting/fixing some glyphs and adding glyphs needed for GB 18030 (and Taiwan's CNS11643 Amendment 1) compliance, which means there may not be a major redesign as I hoped it would be, although I still wish it can happen as Adobe will still need more room for future glyph expansion, like Macao support as mentioned last year (come to think of it, I guess it's still too early for that).

And finally, I personally think that the font design still needs some improvements to work better across different languages and ensure the best consistency, retaining most of the design style of JP-designed glyphs (I already dropped so many hints about this) while keeping compliance with the handwriting-based standards of Chinese. I guess this has to be saved for the next real major version of Sans/Serif when they finally support Macao, which isn't going to be this year.

Anyway, I can't get my hopes up now. Thank you for the reply to help me set some expectations.

Marcus98T commented 1 year ago

I apologise to anyone who got email bombed via the GitHub notification system over the past several hours, with regards to Source Han Sans/Serif.

As you know already, Noto Serif CJK v2.002 and Source Han Serif v2.002 was released. I am verifying some fixes in the changelog, comparing them to the issues on Source Han Sans and Serif.

As of 18 August, Noto Sans v2.005 (not v2.500) is not released yet, likely because it's quite complex to add new glyphs when there's only four blank glyph slots left. Source Han Sans v2.005 is also not released yet.

I think for Sans, if Adobe cannot do a CID change, then probably some orphaned glyphs (especially those with the 番 and 咸/感 components) could be "safely" replaced with the new glyphs required for GB18030-2022 compliance, but then that will mess up the Unicode order.

Here are the possible orphaned glyphs, highlighted in the Glyphs app (with their corresponding CID numbers). I checked to make sure they are truly orphaned and not used by any locales. Probably about 32 glyphs, give or take.

Screenshot 2023-08-18 at 00 06 09

This is using the default Japanese locale of Source Han Sans v2.004. Probably barely enough to cover what's needed for GB18030-2022 compliance, if Adobe means what they said by being the "most minor release" ever.

As for the other existing issues (mostly glyph-related), I can only hope they are not ignored. They, at best, could be put in the burner for now because those issues require a huge CID change in the source.

EDIT: Updated some information. Guess I'm not helping already, and created more headache than what it's worth.

Marcus98T commented 1 year ago

One simple question. Is Sans v2.005 still going to be released?

chrissimpkins commented 11 months ago

Clarification as we (Google) are receiving requests for the amendment 1 updates from users who are tracking this thread.

The latest release from Adobe addresses the initial GB18030-2022 revision, and only required changes in the Noto Serif CJK family. There are likely to be additional Noto Serif and Noto Sans CJK family changes required to address the amendment 1 changes that I documented in the original post on this issue thread, but based on additional information that we received from Ken Lunde (Apple/Unicode) in recent weeks, amendment 1 has not been finalized and there is no compliance date established at this stage. The change requirements and compliance date expectations will be published in the final amendment draft. As I understand it based on my conversations with Zachary off repo, Adobe is waiting for the final draft of the GB18030-2022 amendment to address any changes that will be required.

Zachary and I will continue to update in this thread as we learn more.

punchcutter commented 11 months ago

The recent Serif update addresses the amendment as it currently stands. All that really means in this case is that a small number of characters added in the amendment are already in the font. Only five glyphs were required to comply with the standard as it was initially released, but we added the rest to cover the amendment. They were going to be added anyway so it really had nothing to do with GB 18030 requiring them. Sans already covers the glyphs required in the initial release of the standard so we don't need to update right away, but I've already updated with the additional 26 glyphs. The only reason I haven't made a release yet is because there are lots of other open issues that can be addressed first and there's no finalized amendment so there's no big rush.

chrissimpkins commented 11 months ago

but we added the rest to cover the amendment.

Oh, I think that I misunderstood the latest release then. So, the latest Serif release does cover the latest draft of the 2023 amendment? I was under the impression based on previous conversations with you that your plan was only to add ~five new codepoints necessary to support the 2022 revision (distinct from "amendment 1"). Has there been any progress on finalization of the planned changes in the amendment in the last month to your knowledge? Ken told me that things are not final and I was under the impression that we were waiting until those decisions reached a closer to final/final stage before implementation rolls out.

chrissimpkins commented 11 months ago

@punchcutter mind pushing a list of the new codepoints that you added in the Noto Serif v2.002 release? I think that this will be a more precise way to discuss the changes that happened in v2.002 rather than referring to "GB18030-2022 revision" and "amendment 1" terminology. I'm creating confusion in my attempts to decipher what is going on with all of the standard changes over the last several months. And we need to update our subsetter definitions to support the latest release in the Google Fonts CSS API.

NightFurySL2001 commented 11 months ago

Since the Source Han/Noto projects only support until Level 2 implementation (and additions in level 2 is already supported by 通用规范汉字表), the affected areas are CJK URO and Ext.A only.


GB18030-2022 as it currently stands:

Sans has already completed both ranges in full. Serif is missing quite a few characters in v2.001, so v2.002 completed these.


Amendment 1 additions as it currently stands:

Sans did not have these. Serif added these in v2.002 for forward thinking and filling both URO and Ext.A. This amendment is not officially released yet, but the completion of URO and Ext.A is expected and has a high chance of not changing. @punchcutter is taking the chance to fix the plenty of issues on Source Han Sans repo to be released along with the amendment (when the amendment is made official, that is).


List of glyph addition in Source Han Serif v2.002 release note:

The following glyphs were added to support GB 18030 2022 Implementation Level 2: uni4DB6-CN, uni4DB7-CN, uni4DB8-CN, uni4DBA-CN, uni4DBB-CN, uni4DBC-CN, uni4DBD-CN, uni4DBE-CN, uni4DBF-CN, uni5CB8-JP, uni9FEB-CN, uni9FEB-TW, uni9FEC-CN, uni9FED-CN, uni9FEE-JP, uni9FEF-JP, uni9FF0-CN, uni9FF1-CN, uni9FF2-CN, uni9FF3-CN, uni9FF4-CN, uni9FF5-CN, uni9FF6-CN, uni9FF7-CN, uni9FF8-CN, uni9FF9-CN, uni9FFA-CN, uni9FFB-CN, uni9FFC-CN, uni9FFD-CN, uni9FFE-CN, and uni9FFF-CN.

Side note: It is actually kind of lucky that the latest Unicode version when v2.002 Sans is released, is equivalent to the Unicode version GB18030-2022 originally is based on, which make Sans already in compliance by current GB18030-2022 standard.

punchcutter commented 11 months ago

To be clear we aren't waiting for the amendment to be official. An updated Sans will most likely be released before we get any news on the amendment.

punchcutter commented 9 months ago

At this point any updates are on hold because the GB 18030 2022 draft amendments keep changing things. Once there's an official release it will be easier to say what will be done and when.

NightFurySL2001 commented 9 months ago

The differences are minor though, with the extension characters in Basic and Ext-A block (U+9FF0..U+9FFF and U+4DB6..U+4DBF) being moved from Implementation Level 1 to Level 3 only (among other additions to level 3). Supporting these characters will still be beneficial in that it provides full support for both blocks, and a future GB18030 update is almost guaranteed to include them.