There are only two sequence of characters that form conjuncts in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other than these two, no other CHC combinations form conjuncts. We should be able to place the cursor between the H and C (eg: CHC). This issue was fixed in Android Oreo and iOS 12. The problem exists in many places and needs checking to identify which browsers support and which do not.
ACTION: Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].View the transcriptalolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts? muthu: in modern languages they write phonetically and pulli remains visible r12a:https://github.com/w3c/ilreq/issues/31 is a related issue
<scribe> ACTION: r12a to raise tamil segmentation issue in our repo <trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17]. alolita: so this issue is fixed in recent platforms - you can now put the cursor between muthu: yes neha: the segmentation rules for akshara @@@ https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries vivek: tamil doesn't fall in line with other scripts for handling of clusters
muthu summarises neha and akshat muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different <alolita> akshat: there are 2 definitions of akshara <alolita> akshat: one definition refers to one encoding for all indian scripts <alolita> akshat: this is the IS13194 definition akshat: there are two actual definitions today, iscii 1394 list all conjuncts <alolita> akshat: the other definition is from unicode akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii, <alolita> akshat: unicode instead allocated different code pages for each indian language script <alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear akshat: ilreq doc is unicode specific but doesn't clarify in terms of what scripts are supported - the definition is oriented towards devanagari languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not captured by ilreq
... for tamil we don't need new categories to add to this definition
... definition talks about CHC but in tamil it's only applicable for the two conjuncts alolita: going back to muthu and vivek, there should be a clear definition for tamil so that can be used as foundation for unicode
... having the clarification of differences is needed - that's a gap
2.8 Text boundaries & selection https://w3c.github.io/iip/gap-analysis/taml-gap.html#boundaries
Comment from Muthu: