latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
123 stars 34 forks source link

Warning about the non-existance of vietnamese hyphenation patterns #262

Closed Rimole closed 9 months ago

Rimole commented 9 months ago

From https://tex.stackexchange.com/questions/373049/babel-warning-about-the-non-existance-of-vietnamese-hyphenation-patterns

\documentclass{article}
\usepackage[vietnamese]{babel}
\begin{document}
\end{document}

pdflatexed this prints in the log file:

 (babel.sty(txtbabel.def) (vietnamese.ldf

Package babel Warning: No hyphenation patterns were preloaded for
(babel)                the language 'Vietnamese' into the format.
(babel)                Please, configure your TeX system to add them and
(babel)                rebuild the format. Now I will use the patterns
(babel)                preloaded for \language=0 instead on input line 36.

Loading definitions for the Vietnamese font encoding (t5enc.def(t5enc.dfu))

Package babel Warning: No input encoding specified for Vietnamese on input line 146.

)) (babel-vietnamese.tex)

At StackExchange, egreg suggested to replace \usepackage[vietnamese]{babel} by

\usepackage[utf8]{inputenc}
\usepackage[vietnamese=nohyphenation]{hyphsubst}
\usepackage[vietnamese]{babel}

Elsewhere I read \IfPackageLoadedTF{inputenc}{\addto\extrasvietnamese{\inputencoding{utf8}}}{}. It was suggested that babel should have "empty" hyphenation rules for Vietnamese (because

As a native speaker, I can confirm that all native Vietnamese words are monosyllabic. Hyphenation would only be useful for scientific words that are transcribed from another language, but since there are too many conventions to write them (I can recall at least 4), probably one should best do manual hyphenating. McSinyx Dec 2, 2017 at 13:01

).

car222222 commented 9 months ago

But some sources suggest otherwise:

The Vietnamese language is predominantly a monosyllabic language in which the majority of words have one syllable. The language does have some disyllabic and polysyllabic words, however. Often a syllable is repeated with or without variation of either one sound or a tonal aspect (Thompson, 1965).

But of course this is independent of whether hyphenation should ever be used in Vietnamese typography (even for foreign words).

jbezos commented 9 months ago

Patterns are declared in TeX distributions, even if empty. For example, in TeXLive the file language.dat contains the line:

arabic zerohyph.tex

Very likely a similar line should be added for Vietnamese (and perhaps for some other languages).

Udi-Fogiel commented 9 months ago

Where, or to whom should I approach to request to add a pattern to language.dat? I imagine there is a memory limit to the number of patterns that can be loaded to the format, is it anywhere close to the current situation?

jbezos commented 9 months ago

Don’t worry. I’ll do it myself in the next few days.

Udi-Fogiel commented 9 months ago

Great thanks! just to make sure, can you do that for Hebrew as well?

jbezos commented 9 months ago

@Udi-Fogiel Sure.

Rimole commented 9 months ago

Karl Berry fixed it - thank you!

car222222 commented 9 months ago

Fixed what, and how/where?

jbezos commented 9 months ago

@car222222 In TeXLive: https://tug.org/pipermail/tex-live/2023-September/049480.html.

car222222 commented 9 months ago

Understood.

Is Karl aware that there are some (maybe older) typographic traditions used in Vietnam that do have some limited hyphenation?

jbezos commented 9 months ago

Hyphenation is maintained by @reutenauer. I don’t know he’s aware, but in principle Vietnamese isn’t hyphenated, and in case a set of patters is contributed the only change is to replace zerohyph by the corresponding file.

The real issue here isn’t the warning, which is just noisy, but the fact with babel languages (in the actual sense, not \language) often share patterns and a \hyphenation is applied to all of them (I presume because of the limitations of the original TeX). Furthermore, if babel doesn’t find a \l@language, it sets it to \language=0, ie, English, which is a very bad idea, IMO (too late to change it, I think).

With this change \hyphenation can now be used to add hyphenation points to Vietnamese without touching other languages.