w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
8 stars 15 forks source link

Tamil hyphenation isn't supported #79

Open r12a opened 4 years ago

r12a commented 4 years ago

None of the major browsers support Tamil hyphenation out of the box. This is a problem for text in narrow columns, because Tamil words tend to be long.

In the case of Tamil, simple dictionary lookup is not enough, because the language is highly inflexional and a significant element of morphological analysis is needed in addition to other Tamil-specific orthographic rules for placement of break opportunities.

Tamil also needs to hyphenate without adding a visual marker, as shown in the picture where yellow text indicates hyphenated words that have been wrapped.

hyphenation_ta

Specs: css-text-3 provides the hyphens property, but browsers need to implement the actual mechanism for processing the text.

(It is possible to produce manual hyphenation, but given the number of words that need to be hyphenated in a typical Tamil text, this isn't going to be very useful. The html spec defines the wbr element, which could be used because it doesn't produce a hyphen mark, it only marks a potential break point. ­ does produce a hyphen, so it isn't helpful for Tamil.)

css-text-4 provides the hyphenate-character property, which should allow authors to indicate that Tamil should have no visible marker for hyphenation.

Tests & results: interactive test, hyphens:auto will produce hyphenation for Tamil words

Gecko, Blink, and Webkit all fail to hyphenate the Tamil text when hyphens is set to auto.

Santhosh Thottingal has written a JavaScript-based tool for hyphenating Indian scripts, which mostly relies on syllable-breaks plus a few additional rules.

Browser bug reports: ChromiumWebkitMozilla

Priority: The impact of this is basic, because of the difficulties of handling text in narrow columns.

r12a commented 4 years ago

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include: _Tamil_

xfq commented 2 years ago

Added links to bug reports.