Catalan needs hyphenation. Hyphenation has few complex rules (Catalan L·L, compound words, prefixes, etc.).
Gecko doesn't provide break opportunities at Catalan "L·L". E.g., in "cancel·lar" the "l·l" could be hyphenated as "can-cel-lar", i.e., "l·l" is can be hyphenated as "l-l"; which Gecko doesn't do.
Another hyphenation issue is related with word boundaries and hyphenation rules. Some hyphenation rules are applied to start of word (such rules start with a dot .) or applied to end of word (such rules end with a dot .). These rules are useful for Catalan to manage compound words, prefixes, and inflected verbal forms, but they could not be applied if word has an article joined with apostrophe, or if word is an verbal form with a pronoun attached with hyphen. E.g.:
"inèdit" requires special rule ".i4n3èdit" to get proper breakpoints "in-è-dit". Good. But "inèdit" could be joined to an article, "l'inèdit", so if hyphenation engines gets "l'inèdit" as a single word, then it provides wrong breakpoints "l'i-nè-dit". Of course, it can be patched, just adding a twin rule ".l'i4n3èdit" to get properly breakpoints "l'in-è-dit"
"conduint" requires special rule "u1int." to get proper breakpoints "con-du-int". Good. But "conduint" could have a pronoun attached with hyphen "conduint-ho", so if hyphenation engines gets "conduint-ho" as a single word, then it provides wrong breakpoints "con-duint-ho". Of course, it can be patched, just adding a twin rule "u1int-" to get properly breakpoints "con-du-int-ho".
So this word segmentation cases can be fixed in rule development side, but I write here as documentation to illustrate that word segmentation used before hyphenation library can generate incorrect breakpoints.
Tests & results:
Catalan hyphenation is supported by Gecko, but not by Blink or Webkit.
Gecko hyphenation rules used by Gecko are the old ones, from TeX.
Better, updated, hyphenation rules are used by LibreOffice. Upstream is here.
More systematic tests are needed to ascertain whether Gecko handles everything for Catalan language (such as the L·L mentioned above or word joined to an article).
Catalan needs hyphenation. Hyphenation has few complex rules (Catalan L·L, compound words, prefixes, etc.).
Gecko doesn't provide break opportunities at Catalan "L·L". E.g., in "cancel·lar" the "l·l" could be hyphenated as "can-cel-lar", i.e., "l·l" is can be hyphenated as "l-l"; which Gecko doesn't do.
Another hyphenation issue is related with word boundaries and hyphenation rules. Some hyphenation rules are applied to start of word (such rules start with a dot .) or applied to end of word (such rules end with a dot .). These rules are useful for Catalan to manage compound words, prefixes, and inflected verbal forms, but they could not be applied if word has an article joined with apostrophe, or if word is an verbal form with a pronoun attached with hyphen. E.g.:
"conduint" requires special rule "u1int." to get proper breakpoints "con-du-int". Good. But "conduint" could have a pronoun attached with hyphen "conduint-ho", so if hyphenation engines gets "conduint-ho" as a single word, then it provides wrong breakpoints "con-duint-ho". Of course, it can be patched, just adding a twin rule "u1int-" to get properly breakpoints "con-du-int-ho".
So this word segmentation cases can be fixed in rule development side, but I write here as documentation to illustrate that word segmentation used before hyphenation library can generate incorrect breakpoints.
Tests & results: Catalan hyphenation is supported by Gecko, but not by Blink or Webkit. Gecko hyphenation rules used by Gecko are the old ones, from TeX. Better, updated, hyphenation rules are used by LibreOffice. Upstream is here.
More systematic tests are needed to ascertain whether Gecko handles everything for Catalan language (such as the L·L mentioned above or word joined to an article).