w3c / sealreq

Southeast Asian layout task force
34 stars 5 forks source link

Is inter-character spacing used in Lao? #17

Open r12a opened 6 years ago

r12a commented 6 years ago

Many scripts create emphasis or apply other effects by spacing out the letters or syllables in a word (known as tracking). It's also possible to stretch inter-character space during justification, either explicitly or during justification, as a way of relieving the stress on inter-word/phrase spaces.

CSS provides two properties that have the potential to produce inter-character spacing: a. the letter-spacing property b. text-justify:inter-character.

Please help us better understand how this CSS styling applies to and works (or doesn't) with Lao. Here are some questions to get us started:

  1. Do Lao authors ever apply tracking (ie. inter-character spacing) to text?
  2. Does it ever make sense to apply text-justify:inter-character to Lao? See the description of this property in the CSS spec
  3. If extra gaps are introduced between characters, what are the indivisible text units? ie. presumably combining characters, even spacing ones, would normally not be separated from their base characters (but see 4)? Is a grapheme cluster (ie. base + combining characters) the relevant unit, or should gaps be introduced only around whole syllables?
  4. Thai does allow separation of base character and combining character in the case of ำ [U+0E33 THAI CHARACTER SARA AM]. In fact, this involves converting the vowel sign to U+0E4D THAI CHARACTER NIKHAHIT + U+0E32 THAI CHARACTER SARA AA, then introducing the gap between the [base+nikhahit] and the sara AA. Does Lao do something similar?
  5. Are there any other special things we need to bear in mind wrt adding gaps between characters in Lao? If used, does inter-character spacing work in the same way in both tracking and justification scenarios, or are there differences?

I'll try to summarise the results of this discussion in the Lao gap-analysis document.

jmdurdin commented 6 years ago
  1. I have myself experimented with ICS justification but have never seen inter character spacing used in any actual print or web documents.
  2. ICS could probably be used much the same as for Thai.
  3. In theory either grapheme cluster or inter-syllable spacing is possible but testing is needed to know what would really be acceptable to viewers.
  4. Exactly as for Thai, for SALA AM (U+0EB3), the decomposed Nikhahit (U+0ECD) must remain positioned with respect to the preceding (base) consonant, and space, if any, inserted before the decomposed SALA AA (U+0EB2).
  5. I am unaware of any reason why it should be handled differently.
laonux commented 6 years ago

Hello all and sorry for belated reply.

  1. Have been in touch with Lao linguists at the national university and I don't recall anyone here in Laos ever use ICS or experiment
  2. Agree with Durdin
  3. I have never thought of apply extra gaps for Lao language. This requires further experimenting with it or so to refine the look per se.
  4. Exactly as Durdin pointed it out
  5. Discussed with Lao linguists here and they don't seem to find any special need for gaps as yet.
jclark commented 6 years ago

This looks to be like it is using inter-character spacing for ລາວ

http://kpl.gov.la/Media/Upload/Default/PPX2016/ppx.jpg

(My experience in Thailand is that you need to ask typesetters not linguists about this sort of thing.)

jmdurdin commented 6 years ago

Yes, of course inter-character spacing has been used in Lao in signs, and headings in publications, e.g. using InDesign etc., but my response to this question was mainly in regard to the context of web pages and HTML where, as far as I am aware, inter-character spacing in Lao has not been widely tested and rarely if ever used.

r12a commented 6 years ago

Ok, so here's my stab at a summary that i will write up in the gap-analysis doc for Lao. Please let me know if you think i'm missing or misrepresenting things.


Tracking is not used widely for Lao text, but may appear occasionally in publication headings and in signs.

It is is not yet clear whether gaps should appear at the syllable level, rather than at the grapheme-clusters, or indeed sub-grapheme cluster, level. However, tracking in Thai appears to split below the level of the syllable, and Lao is therefore expected to do the same.

One particular issue that needs to be managed is that a nikahit followed by vowel sign AA is split during tracking in Thai, and it is felt that the same should apply to Lao. For example, ກຳ would look like ກໍ າ after tracking is applied. Note that this is an instance of tracking being applied at the sub-grapheme-cluster level, since the grapheme cluster would encompass the base character and all combining characters.

               -----

Tracking is currently rarely, if ever, used on the Web. CSS provides the letter-spacing property for tracking, but doesn't include specific information about Lao spacing other than to say that tracking occurs between adjacent typographic character units and pointing out the example with nikahit above.

There is a test for exploration at https://w3c.github.io/sealreq/gap-analysis/lao-tests/letter-spacing.html Results:

                            Firefox     Chrome  Safari  Safmob Andrmob
Are there gaps only around syllable boundaries?     no  no  no  no  no
Do gaps split base from superscript vowel/tone?     no  no  yes yes no
Do gaps appear between prescript vowel and base?    yes yes yes yes yes
Do gaps appear between postscript vowel and base?   yes yes yes yes yes
Is AM separated from the base?              yes yes yes yes no
Is the gap between the AA and nikihit in BAM?       no  no  no  no  no

So current practice appears to be that browsers insert gaps between all spacing, non-combining characters, including between the base and any spacing prescript/postscript vowel. They don't split apart superscript combining characters from the base. They don't leave the nikahit in AM over the base character and put a gap after it.

The issue with AM should be addressed, but because tracking is rarely used I propose to classify this as Advanced.

jmdurdin commented 6 years ago

There is a difference between Thai and Lao in regard to tracking and decomposition of SALA AM. In Thai, NIKHAHIT is only used as a component of SALA AM, so superposing NIKHAHIT over a base separated from SALA AA would rarely if ever be confusing ( ํ does not occur separately). But in Lao, NIKHAHIT used on its own is a very widely used vowel, e.g. ບໍ່ = ‘no/not.’ So, for example, ກໍ າ could be more visually confusing to Lao readers than the corresponding expression in Thai would be..

It may be relevant that if inter-character spacing is added in MS Word (Office 2013), SALA AM is not separated for either Thai or Lao:

Lao text without SALA AM is fine to space out at the grapheme level, but for SALA AM, adding inter-character spacing at either grapheme or sub-grapheme level is visually disconcerting, and would diminish the value of using inter-character spacing. Scripts with “wrap-around” vowels and/or ligatures (e.g. Myanmar, Khmer) are likely to be even more problematic in this regard.

ohbendy commented 6 years ago

Lao I consulted a handful of Lao newspapers and it seems tracking is extremely uncommon; the one place I did see it, there was no saraAm in the text but everything was spaced out, so I assume it would have also been spread apart.

It seems to me there are four possible options for tracking (image for illustration only):

screen shot 2018-08-31 at 13 07 56
  1. Add space between every base, meaning saraAm gets spread apart.
  2. Keep all vowel signs (marks and spacing letters) together with the base consonant (so prevowels would also stick to the consonant) Keeping only saraAm stuck to the consonant is also an option, but to my mind the other two-part vowels like ເ◌ັ seem to warrant the same treatment.
  3. Keep syllables together, add space between.
  4. Keep words together.

I asked a Lao colleague which version he prefers, and it's definitely the first. He found 2, 3 and 4 look just like adding space between words, which is more about justification than tracking. He didn't mind the way the saraAm is separated. (His preference for the niggahita [or saraAw as it's apparently called in Lao] is to position it above the right side of the consonant when it's part of saraAm and centrally if it's alone, so the positioning helps avoid some of the confusion mentioned above. That's not possible in Thai where it's always on the right side of a consonant.) Of course it would be really helpful to have more feedback from local users.

One last other possible alternative, found in Saysettha, is to keep the niggahita and saraAa together: screen shot 2018-08-31 at 14 35 14 This seems to have been done to avoid having to make substitutions when the base consonant has an ascender, and Lao people don't seem to have a problem with it, but I don't think it should be considered a good solution.

Thai Note Thai nikahit can appear alone (without sara Aa) in Pali and Sanskrit languages, and in Northern Khmer and Bru (Kha) languages.

r12a commented 6 years ago

Who do we know locally who might have an authoritative answer on this? @laonux, Anousak, any comments?

NorbertLindenberg commented 6 years ago

One question that should always be considered when a feature isn’t used for a script or language: Is the feature not used because users think it’s not relevant to their script, or is it not used because current implementations of the feature don’t work for the script and so people avoid it even though they would like to use it?

If inter-character spacing is used in signs and publications, as @jmdurdin says, but not in web pages, the latter reason might apply. I don’t have any actual knowledge of the situation, however.

laonux commented 6 years ago

Hi all,

I am going to provide some suggestions as whether we need to apply inter-character spacing for Lao or not. As pointed by both @ohbendy and @jclark 👍

  1. kpl.gov.la/Media/Upload/Default/PPX2016/ppx.jpg

This was basically the subject heading of Lao government, this could be written as 'ສປປ ລາວ' but the author is spaced them out (space bar) as 'ສ ປ ປ ປ ລາວ'. This might look like an inter-space is used.

Example:

Letter Spacing

will never be like this

l e t t e r s p a c i n g

Same for Lao:

ສະບາຍດີ

will never be like this

ສ ະ ບ າ ຍ ດ ີ

  1. Below is done by the author only (again using space bar to space out them letters)

untitled

So, the question is do we know if inter-spacing is used for Lao at this time or not. I have consulted, as I said, with different level of people including Lao news media (technical), but the answers I've got are of followings:

  1. Not between letters (consonants) nor between vowels (below or above). However, technically speaking, what it is being used in (Phetsarath OT) was developed using Fontlab and applied some technical font making techniques to it and thus it is well defined. but I don't see there was inter-spacing being applied on it.
  2. Lao in computing may have been utilized a lot of Thai language and have been applied in many OSes (Windows and some Linux, if not all) in terms of rendering or display Lao attributes. It works quite well. However, we are trying to apply this to the web standard and yes we would need to make some enhancements for that particular language. In this case Lao.
  3. I have tried with Ministry of science and Ministry of Posts and Telecom where there are some development works on enhancing existing fonts, the idea of inter-spacing for the web. However they lacked of specialty and could not provide further guidance.
  4. So I got some suggestions, if we could leave the inter-spacing out for Lao now and apply perhaps 'Word-breaking' would be best approach.

Thanks all

r12a commented 5 years ago

For a page that lets you experiment with letter-spacing in Lao on various browsers, see https://w3c.github.io/i18n-tests/css-text/letter-spacing-property/exp-lo-letterspace-000.html

For related experimental tests, see https://w3c.github.io/i18n-tests/results/exploring-justify-space

r12a commented 5 years ago

Thank you all for your comments here. My conclusion is that this is not yet sufficiently well understood to enable formulation of requirements. (In fact, spacing may currently be done manually by space insertion.)

I have marked this discussion with the useful-discussion label, and pointed to it from the text layout index at http://w3c.github.io/typography/#spacing. I'll also point to it from the gap-analysis and lao lreq docs.