sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.63k stars 96 forks source link

You put what in my canon(ical language)? #1368

Open alerque opened 2 years ago

alerque commented 2 years ago

Having introduced a CLDR database in #675, I started using it to verify language settings.

Then I spotted the \font function does it's own form of verification. I'll let it speak for itself though because I have no idea what it thinks it is doing:

$ sile
SILE v0.12.4 (Lua 5.4)
> icu = require("justenoughicu")
> icu.canonicalize_language("jaberwalkie")
en
> icu.canonicalize_language("ukuaoeu")
ukuaoeu

c.f. #1367 for related brain teasers setting languages.

alerque commented 2 years ago

Also we've short-circuited the distinction between languages and locales. To some extend we have scripts separated, but we cross-wire that with languages sometimes too.

Omikhleia commented 1 year ago

jaberwalkie is not a valid (parsable) Locale format for ICU (... there are constraints on field lengths), so it fallbacks to "en". ukuaoeu is valid. It perhaps doesn't exist, but it's canonical form. What was the expectation in this issue?

alerque commented 1 year ago

Interesting. I guess the expectation is that we always either get a KNOWN valid language code or "und" in the event it doesn't exist.