w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.
Other
60 stars 31 forks source link

Lack of font support for Kashmiri characters makes text deviate from true semantics #249

Open r12a opened 2 years ago

r12a commented 2 years ago

This issue is applicable to Kashmiri written with the Perso-arabic script.

Kashmiri is written using the nastaliq style of Arabic writing. Although the Kashmiri orthography has some resemblance to that used for Urdu, to represent Kashmiri sounds it uses a number of unique characters or combinations.

The GAP

There are almost no fonts that properly support Kashmiri written in that orthography. (Noto Nastaliq Urdu was only updated in Feb 2022 to support Kashmiri.)

The result of this is that people resort to using inappropriate characters in their text so that the content looks visually more like they are expecting, and even then gaps remain. For example, to make the sukun look like an inverted v rather than a circle, users often use U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE, which is supposed to be used as an African vowel diacritic. There are several such problems in Kashmiri. Lists can be found here and here

Keyboards and input methods also need to be configured to insert the correct characters, but this doesn't help while there are so few fonts available that can display the characters.

This issue is not likely to be fixed by specifications or browser fixes, but does cause a significant constraint for Kashmiris wishing to use the Web.

There is an additional issue, however, related to pre-installed fonts on macOS (see below).

Priority

Clarifying and standardising the correct usage of characters to represent Kashmiri is a fundamental requirement for interoperable and unerstandable text, so this issue is given a priority of Basic.

Tests & results

interactive test, A given font will correctly render characters needed for Kashmiri in the perso-arabic script.
The glyph shapes when the text in the test are displayed should resemble those in the image just below. In particular: farsi yeh with small v above should join to the left; the 4 forms of kashmiri yeh should appear; hamzas should use the round form; the sukun over PA should be an inverted v.

Screenshot 2022-03-22 at 12 43 33

As of March 2022, the latest version of Noto Nastaliq Urdu supports the needed glyphs, if the language is set to 'ks', and displays correctly on Windows10. However, on macOS 12.2.1 the pre-installed version of the font cannot be overwritten and is used to display Kashmiri text in browsers, meaning that there is no support on macOS at the time of writing.

The SIL's Awami Nastaliq font succeeds in correctly rendering all but one feature: the hamza is s-shaped, as used for Urdu, rather than rounded. However, this is a Graphite font, and so only works currently on Gecko browsers.

The Gulmarg Nastaleeq font supports some features in Windows, but appears to not have glyphs for KASHMIRI YEH or for LETTER WAW WITH RING. It also doesn't work on macOS, presumably for the same reason as the Noto font.

Action taken

Webkit

Outcomes

Version 3.002 and above of Noto Nastaliq Urdu now supports all characters needed for Kashmiri, and will also automatically provide the correct shape for things such as the sukun diacritic if the language of the text is set to Kashmiri.

A Unicode submission was approved by the Unicode Technical Committee that says that a word-final half-yeh should not be written using U+06CD ARABIC LETTER YEH WITH TAIL.

r12a commented 2 years ago

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include: _Kashmiri_

xfq commented 2 years ago

The link to the relevant gap analysis document is broken. I guess it's because we haven't published it yet?

(Same for #250.)

r12a commented 7 months ago

Links fixed. Format updated. Added link to Unicode submission.