w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
8 stars 15 forks source link

Should drop initial styling highlight aytam alone? #59

Open r12a opened 4 years ago

r12a commented 4 years ago

If the user creates a drop initial for a paragraph beginning with a word that has aytam at the beginning, such as ஃபோரியர் or ஃபீசு (i don't know whether such a thing is likely), should they see a large aytam only, or the aytam plus initial syllable (like for punctuation)?

Screenshot 2019-12-06 at 17 23 23

murasu commented 4 years ago

There are no Tamil words that start with aytham. This letter is used with consonant ப (PA) for the 'F' sound, although the proper way to borrow the word into Tamil will be to use ப alone. Nevertheless, if the phonetic representation is required the aytham along with the pa+vowel should be treated as a single unit.

r12a commented 4 years ago

(I don't know whether Tamil grammar permits words at the beginning of a sentence that might begin with a sound modified by aytham, but...) So, if a word like Fourier or Xerox did appear at the beginning of a para with a drop initial, and the author wanted to preserve the actual pronunciation, would the drop intial then need to look like this?

Screenshot 2019-12-11 at 15 15 33

ie. would the first-letter selection need to be extended to include 2 grapheme clusters in this instance?

lianghai commented 4 years ago

It’s irresponsible to claim stuff like “there are no such things” or “the proper way” when apparently the usage is attested. The usage of aytham as a modifier to signify loanword sounds may not be one’s preference, but please do not deny its attestation.

For such marginal cases where native users are unlikely to have an established convention, I’d recommend that a graphically more naive/simple approach should be taken (in this case, it seems to be “aytham alone”), and the approach (any approach) should allow overriding.

lianghai commented 4 years ago

Now I see Muthu’s sentence “There are no Tamil words that start with aytham.” was meant to emphasize native words. Which is true. But let’s just note the usage of ஃ as a prefixed modifier for consonant letters (mostly for ப) to represent loanword sounds is very common, and loanwords are everywhere in real world texts.

Also note, it’s unclear if ஃ’s usage in classical Tamil texts (which seems to be only between two syllables) is considered to be depending on its preceding cluster (written syllable) or its following cluster.

miloush commented 4 years ago

Interesting that in the first case, @r12a was happy to have the drop case span two rows, while in the second case, he chose one row drop only. I am happy with ஃ to be an initial drop on its own. You don't delete it when you backspace on ஃப and it has its own sort order. Most importantly, the implementer has to do nothing extra to get this rendering which is as good as the other one with no or little attestation.

Also note, it’s unclear if ஃ’s usage in classical Tamil texts (which seems to be only between two syllables) is considered to be depending on its preceding cluster (written syllable) or its following cluster.

@lianghai Krishnamurti in his Dravidian Languages book (where he has it under semivowels by the way, with வ and ய), describes both these functions - prolonging previous vowel and changing the pronunciation of the following consonant. That would mean different line break opportunities, i.e. கா-ஃபீ but பஃ-து, and it would also explain words like சுஃஃறெனல். For the drop initial case though, the first function of interacting with preceding cluster is irrelevant.