Tamil: 2.8 Text boundaries & selection

This issue was discussed in a meeting.

ACTION: Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].
View the transcript
alolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts?
muthu: in modern languages they write phonetically and pulli remains visible
r12a: https://github.com/w3c/ilreq/issues/31 is a related issue
<scribe> ACTION: r12a to raise tamil segmentation issue in our repo
<trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].
alolita: so this issue is fixed in recent platforms - you can now put the cursor between
muthu: yes
neha: the segmentation rules for akshara @@@
https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries
vivek: tamil doesn't fall in line with other scripts for handling of clusters
muthu summarises neha and akshat
muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different
<alolita> akshat: there are 2 definitions of akshara
<alolita> akshat: one definition refers to one encoding for all indian scripts
<alolita> akshat: this is the IS13194 definition
akshat: there are two actual definitions today, iscii 1394 list all conjuncts
<alolita> akshat: the other definition is from unicode
akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii,
<alolita> akshat: unicode instead allocated different code pages for each indian language script
<alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear
akshat: ilreq doc is unicode specific but doesn't clarify in terms of what scripts are supported - the definition is oriented towards devanagari languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not captured by ilreq
... for tamil we don't need new categories to add to this definition
... definition talks about CHC but in tamil it's only applicable for the two conjuncts
alolita: going back to muthu and vivek, there should be a clear definition for tamil so that can be used as foundation for unicode
... having the clarification of differences is needed - that's a gap

w3c / iip

Tamil: 2.8 Text boundaries & selection #20