sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.61k stars 97 forks source link

Add support for Upper and Lower Sorbian, aka Wendish #1994

Open alerque opened 5 months ago

alerque commented 5 months ago

Interesting feedback here https://github.com/typst/typst/issues/3235#issuecomment-1924720389 adding (lower) Sorbian and Croatian to the list, and confirming Czech and Slovak.

Sorbian is a minority language (< 50000 people), it doesn't have a 2-letter language codes. Unless mistaken the 3-letter codes are hsb (Upper Sorbian), dsb (Lower Sorbian) and wen (Sorbian or "Wendish" collectively)

Originally posted by @Omikhleia in https://github.com/sile-typesetter/sile/issues/1963#issuecomment-1925000544


While dealing with the explicit hyphen repetition handling I skipped Sorbian (which we now know can use that alternative code) because we don't have a language support file or hyphenation patterns for it at all. Since I'm guessing we can probably apply some other language's patters to it this shouldn't be too hard to add. Maybe after BCP-47?

Omikhleia commented 5 months ago

Since I'm guessing we can probably apply some other language's patters

You mean porting https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-hsb.tex for Upper Sorbian, I assume.

Omikhleia commented 5 months ago

This would be an occasion to split hyph patterns from the segmenter in different files, add their origin details and the scripts that were used to build them into a Lua table -- so we could check they are up-to-date and have the tooling for re-generating them.