w3c / eurlreq

European language enablement
7 stars 3 forks source link

Catalan hyphenation details poorly supported #34

Open jmontane opened 1 year ago

jmontane commented 1 year ago

Catalan needs hyphenation. Hyphenation has few complex rules (Catalan L·L, compound words, prefixes, etc.).

Gecko doesn't provide break opportunities at Catalan "L·L". E.g., in "cancel·lar" the "l·l" could be hyphenated as "can-cel-lar", i.e., "l·l" is can be hyphenated as "l-l"; which Gecko doesn't do.

Another hyphenation issue is related with word boundaries and hyphenation rules. Some hyphenation rules are applied to start of word (such rules start with a dot .) or applied to end of word (such rules end with a dot .). These rules are useful for Catalan to manage compound words, prefixes, and inflected verbal forms, but they could not be applied if word has an article joined with apostrophe, or if word is an verbal form with a pronoun attached with hyphen. E.g.:

Tests & results: Catalan hyphenation is supported by Gecko, but not by Blink or Webkit. Gecko hyphenation rules used by Gecko are the old ones, from TeX. Better, updated, hyphenation rules are used by LibreOffice. Upstream is here.

More systematic tests are needed to ascertain whether Gecko handles everything for Catalan language (such as the L·L mentioned above or word joined to an article).