w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
8 stars 15 forks source link

Devanagari: 4.1 Lines and Paragraphs - review rules for line breaking - should line breaking be driven by breaking at end of each grapheme cluster as defined in Unicode UTR29? #40

Open alolita opened 5 years ago

alolita commented 5 years ago

https://w3c.github.io/iip/gap-analysis/deva-gap.html#linebreak

Is line breaking in Devanagari driven by breaking at a word boundary (white spaces) or at end of a grapheme cluster (as defined by UTR 29)?

Are line breaking rules in Unicode UTR 29 sufficient to handle all use cases in Devanagari?

Do the scenarios for text boundaries need to be re-examined and enhanced?

See http://unicode.org/reports/tr29/

r12a commented 5 years ago

I created a bunch of exploratory tests last week, which can be used to check current behaviour on browsers.

For Devanagari (Hindi) there is https://w3c.github.io/i18n-tests/css-text/line-breaking/exp-hi-line-break-000.html

Note that you can add any text you want lower down the page. You can also change the styling using CSS properties (if they are supported) using the controls at the bottom of the page. The idea is that you can play around with various bits and types of text and see what happens, and if you spot irregularities we can raise gap descriptions about them.

There are also tests for:

As a follow-up, i'd like to itemise the various combinations of characters in each script that typically occur, and write specific, individual tests for each to check that they do what we expect.

So please play with the tests and report anything you find of interest.

(There are also some tests related to line-breaking at https://www.w3.org/International/tests/#css3-text-line-breaks that test behaviour of specific characters. The only one that seems relevant is by block, which has a test for devanagari dandas.)