Unicode 14 update - Githubissues

n8willis commented 3 years ago

This updates the character tables for the Arabic, Kannada, Mongolian, and Telugu docs to reflect additions in Unicode v 14, including new codepoints and the corresponding Indic Positional / Indic Syllabic / Arabic Shaping / general-UCD info.

I believe these are the only scripts affected by the updated release. Please speak up if I have overlooked something.

Note that for Arabic there is an entirely new block (Extended-B) and some additional Joining Groups.

I don't believe that there were major changes to the info on existing codepoints (the delta charts seem to reflect mostly representative glyph updates ...) but that is worth a separate pass anyway; new codepoints are (at least) self-contained and not likely to break existing implementations.

Note also that this update should be considered "raw" info. Several minor changes may have behavioral effects that will be discovered and sorted out by implementers. Will watch for such information from HarfBuzz and AllSorts, among others!

Of particular note in this respect is the fact that Kannada and Telugu have now acquired codepoints for a CONSONANT_DEAD letter, Nakaara Pollu. There is an existing issue on that letter, #116, which has so far received no comments. If it affects syllable-id or shaping, that will probably mean revision to the actual shaping docs for those scripts.

wezm commented 2 years ago

I've done a Unicode 14 update to Allsorts. This mostly involved updating the various data used from the UCD as well as the following:

Update the list of Arabic chars that are modifier combining marks.
Update the shaping class according to your updates here.

Aside from this I've not made any behavioural changes to the shaping engine.