How to define a coptic subset (or, should we change the Greek subset)?

davelab6 commented 3 years ago

@irenevl @moyogo @tiroj We are adding a coptic subset (referenced in https://github.com/google/fonts/pull/3324)

In early Unicode, Coptic characters (3E2 thru 3EF) were added to the Greek block (https://en.m.wikipedia.org/wiki/Greek_and_Coptic) and our Greek subset, and this was carried through ~11 years ago into the greek.nam GF API subset

https://github.com/googlefonts/gftools/blob/main/Lib/gftools/encodings/greek_unique-glyphs.nam#L115-L128

@marekjez86 proposes dropping those characters from that subset, and adding them to the new coptic.nam subset. However, it seems these characters were added initially because they are 'shared' or 'unified' with Greek in some way.

If a "greek" font includes them, should they be delivered as standard? Would they be kerned or otherwise be included in features that will be broken by sharding them into a 2nd subset?

Is it common to include them?

rsheeter commented 3 years ago

FYI, for now I've defined coptic based on Noto Coptic, leaving the overlap with greek. Expert advice still much appreciated :)

Ends up looking like https://github.com/googlefonts/gftools/blob/latest_nam/Lib/gftools/encodings/coptic_unique-glyphs.nam

davelab6 commented 3 years ago

@khaledhosny I think you were tweeting about this recently

khaledhosny commented 3 years ago

These characters are derived from Demotic script to represent the sounds in Coptic language not present in Greek. They are of no use outside Greek and their presence in Greek block is a historical artifact from the pre-deunification in Unicode.

I don't know what coverage most Coptic fonts have, but the Coptic block has both basic alphabet as well as dialect-specific characters, so unless fonts usually support all of them you might want a Greek and Greek Extended subsets (or may be it is an overkill).

khaledhosny commented 3 years ago

(I only started learning coptic a couple of days ago, so take whatever I say with a big grain of salt)

moyogo commented 3 years ago

The range U+03E2-U+03EF in the "Greek and Coptic" block has always only been used in Coptic. It shouldn't be in the greek.nam Greek subset.

The range U+2C80-U+2CB1 in the "Coptic" block was added in Unicode 4.1 in 2005 when the two scripts were disunified. Before that, the ranges U+0391-U+03A1, U+03A3-U+03A9, U+03B1-U+03C9 in the "Greek and Coptic" block were used in both scripts. Coptic also uses a few diacritics so other characters from the "Greek and Coptic" and "Greek extended" blocks were used as well. Users had to have different fonts for each script's style, or layout features had to provide stylistic variants.

thlinard commented 3 years ago

FYI, for now I've defined coptic based on Noto Coptic, leaving the overlap with greek. Expert advice still much appreciated :)

Ends up looking like https://github.com/googlefonts/gftools/blob/latest_nam/Lib/gftools/encodings/coptic_unique-glyphs.nam

Noto Coptic is not quite complete (it lacks one combining mark and some punctuation marks). A more complete page (but it does not include the Coptic Epact Numbers block, which you did well to add): https://www.wazu.jp/gallery/Test_Coptic.html

These are the characters of the most complete Coptic fonts:

So, I propose to add: 0x002C COMMA 0x002E FULL STOP 0x003A COLON 0x003B SEMICOLON 0x00B7 MIDDLE DOT 0x0311 COMBINING INVERTED BREVE 0x2053 SWUNG DASH 0x2056 THREE DOT PUNCTUATION 0x2058 FOUR DOT PUNCTUATION 0x2059 FIVE DOT PUNCTUATION 0x2E17 DOUBLE OBLIQUE HYPHEN

moyogo commented 3 years ago

The following two were encoded for their use in Coptic (L2-10/348) and should be added as well: U+2E33 RAISED DOT U+2E34 RAISED COMMA

tiroj commented 3 years ago

@marekjez86 proposes dropping those characters from that subset, and adding them to the new coptic.nam subset. However, it seems these characters were added initially because they are 'shared' or 'unified' with Greek in some way.

Coptic script originated as a localised flavour of Greek script extended with shapes derived from Egyptian Demotic script to represent non-Greek phonemes. So that was how Unicode originally encoded Coptic, with the idea that the Greek-derived letters would be Greek characters and the Demotic-derived letters treated as an extension of the Greek script. That unification proved problematic in a number of respects, but critically it made it next to impossible at the time to create a single font that properly supported both Greek and Coptic scripts given the divergent styles of writing and typography for the two languages. Eventually, Unicode decided to disunify Coptic from Greek, but did so by retaining the existing Demotic-derived Coptic characters in the Greek block and adding a new Coptic block for disunified Greek-derived characters (and some later additions for specific aspects of Coptic texts). So Coptic script support requires characters from both the Greek block and the Coptic block, plus punctuation as noted by Thomas and Denis.

A separate coptic.nam subset makes sense, and there is not need to include either the non-Coptic Greek characters in that subset nor the Coptic characters in the Greek block in the greek.nam subset.

[Of related note: there is an active movement in North East Africa and the Nubian diaspora to resurrect use of the traditional Nubian script that is derived from and encoded as Coptic. The distinction between Coptic and Nubian is one of style rather than encoding, and I can imagine a Noto Nubian typeface with a distinctive design using the same Coptic subset.]

davelab6 commented 3 years ago

thlinard commented 3 years ago

https://github.com/googlefonts/gftools/pull/379

RagaeGhaly commented 2 years ago

It’s obvious that the current Unicode for Coptic is useless in sorting characters, as the last seven characters (ϣ-ϯ) have lower Unicode values that the remaining characters (ⲁ-ⲱ). I am wondering for the large number of Coptologists and Egyptologists didn’t raise this issue. I hope this should be taking care of by the next version, and producing one complete block that contain all the Coptic characters in a legitimate sequence, which can be useful in many aspects of document processing. My best regards.

google / fonts

How to define a coptic subset (or, should we change the Greek subset)? #3325