w3c / afrlreq

African language enablement for the Web
7 stars 6 forks source link

Lack of clarity on how to encode N'Ko te-kerende #36

Open r12a opened 1 year ago

r12a commented 1 year ago

This issue is applicable to N'Ko.

Certain constructs in N'Ko text mean 'each and every ....', and they appear with dash on the baseline with spaces either side. For example:

Screenshot 2023-05-16 at 17 51 06

This is also used in other locations where we might use a dash in Latin text.

The question is what is the appropriate character for the te-kerende and other similar looking uses.

More:

The GAP

Research showed that users are using ߺ U+07FA NKO LAJANYALAN with spaces either side for this. However, that character's main stated role in the Unicode Standard is to act like the Arabic tatweel and extend the baseline while joining the characters either side.

This approach works in all browsers.

See an example page 3, col 2 (from the right), below the picture.

The Unicode Standard doesn't provide any advice on this topic. The original proposal included a request for a te-kerende character, but it was not adopted.

Action taken

The question was raised at a Unicode Script Ad Hoc meeting.

Outcomes

The Unicode Script Ad Hoc committee considered the matter and agreed that the te-kerende should be represented using <space><lajanyalan><space>.

Priority

This is already a de facto standard

r12a commented 1 year ago

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include: N'Ko

jfkthame commented 1 year ago

The linked document at http://cormand.huma-num.fr/maninkabiblio/periodiques/silabosoona5.pdf also shows an example of one of the reasons <space><lajanyalan><space> is an unsatisfactory representation: see the first (right-hand) column on page 6, lines 2-3, where there is an occurrence of "ߞߏ ߺ ߏ ߺ ߞߏ߫" with the line wrapped at the space before the second te-kerende.

It is clearly stated in https://www.unicode.org/L2/L2015/15338-n4706-nko-additions.pdf that "A line can break after but not before a TE-KERENDE", but because users are forced to add spaces around it (because lajanyalan has completely different joining/rendering behavior), this spurious line-break occurs.

The fact that this usage "is a de facto standard" does not, I think, indicate that it is a good or appropriate way to encode te-kerende; only that users have had to make do with the character repertoire on their keyboard. It's like users representing the copyright symbol with "(c)" because they don't know how to type "©".

Te-kerende should (in my opinion) have been encoded as a character in its own right, as was proposed in N4706, and could then have easily been made available on N'Ko keyboard layouts.

It would still be possible to rectify this, although unfortunately there will be a legacy corpus of documents using the <space><lajanyalan><space> hack. But the sooner a real te-kerende is encoded, the sooner usage can begin to migrate to the better representation.