latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
123 stars 34 forks source link

Arabic: Kashidas displaced by ligatures #258

Open lueck opened 10 months ago

lueck commented 10 months ago

There is a whole class of errors in the kashida justification resulting from ligatures: A ligature of two letters, which allow kashida insertion in between, results in the kashida deferred/displaced after the ligature, i.e. after the second (the last) letter it consists of. So kashidas can occur, where they must not occur. Roughly, 10 out of 100 words in my texts are effected by such errors.

I would suggest to add an option to the babel package to turn off kashida insertion after ligatures.

Here is an MWE, where you can see kashidas after ALEFs, at the end of words and other places, where they should not occur.

\documentclass{book}

\usepackage{luabidi}
\setRTLmain

\usepackage[english,bidi=basic]{babel}[2021/05/16]% version 3.59 or later
% see babel's change log: https://latex3.github.io/babel/#whats-new

\babelprovide[import,main,%
justification=kashida,%
transforms=kashida.plain%
]{arabic}

\babelfont{rm}[Scale=3]{ArabicTypesetting} % {FreeSerif}
% font source: https://arabicfonts.net/fonts/arabic-typesetting-regular

% output a test case with \case{NUMBER}{WORD}{EXPECTATION}
\newcommand*{\case}[3]{%
  \noindent #1 %
  \directlua{Babel.arabic.justify_enabled=false}%
  #2 %
  -- #3 %
  \directlua{Babel.arabic.justify_enabled=true}%
  \hfill%
  \fbox{\makebox[5em][s]{#2}}%
}

%% override default rule from kashida.plain
\babelprehyphenation{arabic}{()ل()[]*[اأإآ]}{kashida = 500}

\begin{document}

\case{1}{لا}{لا}%

\case{2}{بِأَبي}{بِـأَبي}%

\case{3}{بِيَ}{بِيَ}%

\case{4}{فكانَ}{فـكانَ}%

\case{5}{باخِلٌ}{باخِـلٌ}%

\case{6}{له}{له}%

%\case{e}{ل\/ه}{لـه}%

\end{document}

TEX engine: LuaHBTeX, Version 1.17.0 (TeX Live 2023)

babel version: 2023/08/09 v3.92.22182 The Babel package (from github)

The test case 1 is the sequence of LAM and ALEF, for which there is a ligature in (almost) every Arabic font. The MWE overrides a rule from kashida.plain transformation, that excludes Kashidas between LAM and ALEF.

kashida-ligature AT

Above is the result with font ArabicTypesetting (see link in comment in MWE), which provides many ligatures. There's an error in each case, 1...6.

kashida-ligature FS

Above is the result of the same MWE with font FreeSerif which provides only some standard ligatures like LAM+ALEF and thus does not have so many errors (in fact only case 1).

As you can infer from the comparison, the false kashidas result from the ligatures.

With an option for turning on/off kashida insertion after ligatures, we would gain

  1. more sensible or transparent justification rules: IMO it feels like an odd workaround when we need a rule that forbids Kashidas between e.g. LAM and ALEF in order to turn off Kashidas after LAM+ALEF.

  2. we would not have to clutter up the set of justification/hyphenation rules with font-specific rules which take care of the ligatures, that are actually present in the font

  3. It would confirm to LuaTex's idea about hyphenation: "whether or not hyphenation takes place should not depend on the current font, it is a language property" (LuaTeX Reference Manual, sec. 5.5, p. 76)

  4. If it can be turned on/off, nothing is lost for those guys who want a fine-grained rule set and need it when they type ligatures already into the TeX input file.