Closed GoogleCodeExporter closed 9 years ago
Thanks a lot for the report. I agree that SHA+VIRAMA+RA+II should form a single
ligature. I filed that internally as noto-alpha/192.
For SA+VIRAMA+RA+II, it's not clear to me what Unicode has decided (if it
should only be displayed with a visible pulli, or only with a ligature, or both
are acceptable). The reference you provided (L2/05-129) doesn't say anything
about that sequence, and I could not arrive at a conclusion from reading
section 9.6 of Unicode Core Specification, version 6.2. It appears to me that
all three Tamil fonts in Windows 8.1 render SA+VIRAMA+RA+II as the same
ligature.
Would you please point us to a UTC decision or Unicode text where it says or
implies that SA+VIRAMA+RA+II should be rendered with a visible pulli?
Original comment by roozbeh@google.com
on 2 Apr 2014 at 11:30
http://www.unicode.org/faq/tamil.html#12 says that 'mapping should be
**updated** from <U+0BB8, U+0BCD, U+0BB0, U+0BC0> to <U+0BB6, U+0BCD, U+0BB0,
U+0BC0>' instead of saying something like 'a new mapping should be added from
<U+0BB6, U+0BCD, U+0BB0, U+0BC0>' or 'along with the old mapping, a new mapping
has to be added' which can very well imply that SA+VIRAMA+RA+II should be
rendered in non conjunct form with visible pulli.
Linguistically SHRI is a character and having dual encoding does more
harm(affects search etc) than good(compatibility sake). SA+VIRAMA+RA+II was not
the equivalent of SHRI and which was why a new character SSA (U+0BB8) got
introduced and the definition was **updated**. I wasn't aware of what Windows
did, but if they too render complex glyph, thats a bug again.
Original comment by srik....@gmail.com
on 6 Apr 2014 at 5:07
I concur with Srikanth. If fonts continue to display non-standard sequences
like this, then as Srikanth says the interoperability purpose of the standard
is lost.
Consider Arabic/Urdu-based names like tasrīn. In Tamil script they should be
written as தஸ்ரீன் (தஸ்.ரீன் without the dot)
but with the current behaviour they are displayed identical to
தஶ்ரீன் whereas ஶ்ரீ is only ever found in Sanskrit-based
names. (On Firefox 28 on my Kubuntu Saucy system I am able to prevent the
ligature by using ZWNJ but that should not be required for normal usage.)
Original comment by samj...@gmail.com
on 6 Apr 2014 at 5:58
Thanks a lot for the examples and the discussion.
It appears that SA+VIRAMA+RA+II is very commonly used for "sri/shri" on the web
(compare Google search results for both sequences), including on the title page
of the Tamil Wikipedia article about the ligature:
http://ta.wikipedia.org/s/14u0
I'm following the SA+VIRAMA+RA+II issue up with the Unicode Technical
Committee, and will bring it up at our next meeting in early May, with a
pointer to the discussion here.
Original comment by roozbeh@google.com
on 17 Apr 2014 at 1:47
[deleted comment]
Thank you very much for taking this up for UTC. If you look at the same
wikipedia page about the origin of the character, it says U+0BB6 is its root.
The comparison is google search results will have inherent bias to old sequence
since most of the fonts / input tools did not adopt new encoding. The problem
is visible off late (more than couple of years now) since Apple adopted the
latest standard, hence causing fragmentation.
Errata on Comment #2 'why a new character SSA (U+0BB8)' should be read as 'why
a new character SSA (U+0BB6)'
Original comment by srik....@gmail.com
on 19 Apr 2014 at 7:13
The bug is fixed in r245. I also got an action item from the the UTC to write a
proposal about the problem: http://www.unicode.org/L2/L2014/14100.htm#139-A37
Original comment by roozbeh@google.com
on 16 May 2014 at 1:11
Hello Roozbeh. Please can you elaborate what was decided in UTC? The AI doesn't
explain that.
Original comment by jamada...@gmail.com
on 16 May 2014 at 1:49
Nothing was decided. I was told to come up with a proposal that tells what
exactly needs to be changed in which parts of the standard. UTC will decide
what to do when they see the proposal.
Original comment by roozbeh@google.com
on 16 May 2014 at 2:01
Then what exactly was "fixed in r245"?
Original comment by samj...@gmail.com
on 16 May 2014 at 2:17
SHA+VIRAMA+RA+II now forms a ligature.
Original comment by roozbeh@google.com
on 16 May 2014 at 4:37
Hello Roozbeh, can you please file and bug for and fix the same problem with
Droid Sans Tamil too? (Sorry for putting it on you but I'm really bogged up
here, whence I didn't call in to the UTC too.) Thanks.
Original comment by jamada...@gmail.com
on 17 May 2014 at 5:01
Droid Sans Tamil is no longer supported. Only Noto is supported.
Original comment by roozbeh@google.com
on 17 May 2014 at 6:01
Thanks for following this up. I am unaware of unicode process, but if its okay,
can you please share your proposal when its ready, so that we could give
feedback on the same before it gets discussed in UTC. Thanks
Original comment by srik....@gmail.com
on 17 May 2014 at 12:35
Original issue reported on code.google.com by
srik....@gmail.com
on 23 Mar 2014 at 8:14Attachments: