w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.
Other
60 stars 31 forks source link

Correct Spacing for LAM before Non-Arabic and BEH Before Number #189

Open shervinafshar opened 5 years ago

shervinafshar commented 5 years ago

Unicode (9.2, p.369 in version 11.0):

The use of a joiner adjacent to a suitable letter permits that letter to form a cursive connection without a visible neighbor. This provides a simple way to encode some special cases, such as exhibiting a connecting form in isolation.

Common practice observed on some online reputable publications is LAM + TATWEEL + SPACE + ..., but this might be (a) to keep readability, (b) technical limitation in entering ZWJ.

Need to survey the experts for the correct and/or common method.

42552033-4a587620-848f-11e8-9817-401a8e528ae7

Testing with Google web-fonts: screencapture-fonts-google-2018-07-10-22_29_50

Plain text for testing in other environments:

ل‍ W3C
لـW3C
لـ W3C
ب‍42
ب‍ 42
بـ42
بـ 42
ntounsi commented 5 years ago

Not only before non-arabic or numbers. Same kind of joining behaviors appear before a text link, Arabic included. (for example, translation of sentence like "for something" where something is a text link.) LAM<a href="">ARBIC TEXT LINK</a>

I remember an old issue that some inline markup, <a> here, may break cursive joining in some browsers.

ntounsi commented 5 years ago

A test here.

shervinafshar commented 5 years ago

@ntounsi, while this is a valid case with the hyperlink breaking, but the question here is more about which method should be used to correct the gap caused by either of LAM+non-Arabic, BEH+number, or link. Do you happen to have any information on that?

ntounsi commented 5 years ago

@shervinafshar, I don't have any suitable information about "correct spacing for LAM before latin" . I often use to write LAM followed by W3C (to mean "for W3C"), and I do LAM+W3C, without any character between. Meanwhile I tried and I didn't find a rule around this case. In some printings (I don't remember sources), I sometime see "LAM+latin-text", written indifferently with or without space before latin, LAM in isolated or joined form.

asmusf commented 5 years ago

Why are the ZWJ on the "far" side of the space character in the examples at the top?

shervinafshar commented 5 years ago

@asmusf, was a mistake. Corrected that. could you please be more specific? I don't follow.

asmusf commented 5 years ago

We have two characters that are separated by SPACE. One of them is from a script subject to joining behavior. On which side of the space do we expect the ZWJ?

shervinafshar commented 5 years ago

ZWJ appears immediately after character from the script subject to joining behavior to force initial/medial form from an standalone character.