Closed sgolovan closed 7 years ago
I would suggest to raise this question on the mailing list. Either tex-hyphen or kadingira or some lualatex list. Arthur knows more, but I'm illiterate in LaTeX and some feedback from Babel developer would be welcome.
But I'm slightly confused. Doesn't LuaLaTeX use a different mechanism for loading the patterns? I thought it used plain text patterns directly. Then again, I did not follow the recent changes in Babel too closely.
Afaik, LuaLaTeX can load patterns both ways, via a Lua hook or the old way, sourcing a file with hyphenation patterns in it. Anyway, someone has to tell it which patterns to use for a given language. And I know two implementations: 1) polyglossia makes LuaTeX use the text patterns (hyph-ru.pat.txt for Russian), 2) Babel just loads loadhyph-ru.tex.
Legacy documents don't use Polyglossia, making only Babel to need both UTF-8 and 8-bit patterns.
I'll ask this question in a mailing list, thank you for the suggestion.
This is a Babel issue, since as you’re aware no patterns are dumped into the LuaLaTeX format (except for hyphen.tex
), and it’s thus up to packages to decide what to do. Javier chose to use language.dat
directly, without explaining why, and I think it would be good to discuss it with him. The Kadingira list is probably the best place.
Closing the issue.
Indeed. Please ask the author of Babel or ask for help on stackexchange. I know that XeTeX had a mechanism to map old fonts to the proper Unicode slots which is much cleaner at the end, ConTeXt did something similar in the early days of Unicode. Handling the problem at the patterns end would be the wrong place for the fix, also because we don't know when the user might change the font and we don't really know the desired encoding. This needs proper support on the LaTeX/Babel end.
Hi!
There's a use case for the hyphenation patterns loader which isn't covered by the current code.
Sometimes I want to compile a legacy document using LuaLaTeX. The document (it's usually in Russian) uses T2A font encoding and some input encoding (cp1251 or utf-8). After replacing
inputenc
byluainputenc
the relevant part of the preamble becaomes the following:The main problem with this setup is that
babel
dynamically loads the russian hyphenation patterns using the usual language.dat which in turn make it source loadhyph-ru.tex, and then since the TeX engine is Unicode-aware, it loads hyph-ru.tex in UTF-8 encoding, so hyphenation is essentially switched off as Russian letters in T2A encoding reside in different slots.Locally, I use the following customized loadhyph-ru.tex, which checks the default font encoding (if it's set by fonenc.sty) to be in the Russian encodings list and loads the T2A patterns (designed for pTeX initially) if it's the case:
So I'd like to ask if this approach makes sense, and if it could be done for all the hyphenation loaders to make things work without local changes. Or maybe there's some other way.