latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
123 stars 34 forks source link

Too much space before a comma in maths if [italian] is used #287

Closed AlMa1r closed 3 months ago

AlMa1r commented 3 months ago

Feeding lualatex with

\documentclass[italian,ngerman]{article}
\usepackage[italian,main=ngerman]{babel}% let's insist that German is the main language
\usepackage{unicode-math}
\setmathfont{TeX Gyre Termes Math}
%\NoIntelligentComma% throws an error
\begin{document}\noindent
Sei \((T,{\le})\) eine halbgeordnete Menge.%% German for `Let \((T,{\le})\) be a poset.`. 
\end{document}

yields, after running diffpdf,

grafik

To the left we see the output with babel 24.2 and italian 2024/01/03 v.1.5.00, compiled with the current TeX Live 2023.

To the right we see the output with babel 3.95 and italian 2022/03/27 v.1.4.07, compiled with a stock Debian TeX Live 2023 distribution. Alternatively, use TeX Live 2020 with babel 2021/03/03 3.55 and italian 2020/05/21 v.1.4.04.

The pink highlighting shows where diffpdf thinks the output (visually, in our case) differs. (While reproducing and switching between the TeX Live 2023 versions, remove *.aux and ~/.texlive* in between to avoid using cached data.) The left (newer) output has too much space between the letter 𝑇 and the comma. If we drop the nonmain language italian from the options, the issue disappears. Clearly, simply adding a nonmain language shouldn't influence the output (especially the math output) and, in particular, shouldn't make it worse.

The math-mode part should yield (𝑇, ≀), which stands for an ordered pair of a set and a binary relation. We expect a negative kern or no gap between the 𝑇 and the comma (and a small/thin space between the comma and ≀). As for the German standards, DIN 1302-1999 uses the tall comma (the old one on the right-hand side of the image) for the pair (𝑃, 𝑄), DIN 5473-1992 uses (as far as I can tell from the print) no extra space before the comma in pairs such as (π‘Ž, 𝑏) and (π‘₯, π‘₯), and DIN 1338-1996 uses (as far as I can tell from the print) no extra space before the comma in pairs such as (π‘₯, 𝑦, 𝑧) either. Whoever has access to newer versions of the DINs is welcome to comment. While I don't care about the kind of the comma used, I do care about not introducing space that shouldn't be there.

We kindly ask for a bugfix. The maintainer mentioned in italian.ldf has been informed.

P.S. I can't reproduce the good, tight spacing between the letter 𝑇 and the comma with stock Debian TeX Live 2023 on the machine on which I have a choice between the current TeX Live 2023 and the stock Debian TeX Live 2023. I don't know why (probably, lualatex employs more caches than I am aware of). I needed a separate machine with the old Debian TeX Live 2023 to get the good, tight spacing for the input as stated or switch to TeX Live 2020. On any machine, I also cannot reproduce the good, tight spacing between the letter 𝑇 and the comma when the options italian are removed.

jbezos commented 3 months ago

I’m making some tests with and without babel, and the result is always the same (the left image). If the pink rectangle is the bounding box, then the spacing is the same even in your images, except the comma comes from different fonts and we can expect the result differs. Note there is a tiny space before the comma in the left image inside the rectangle. I’d say there was a bug in italian recently fixed, because the expected result is the left one.

AlMa1r commented 3 months ago

@jbezos Thank you for testing. In my understanding, the pink rectangle is probably not the bounding box (at least I have not added the pink highlighting via LaTeX commands). The pink highlighting is added by diffpdf to show where the output (visually, in our case) differs. Concerning the fonts: any idea which two fonts the two commas might come from? As for the expected result … The form of the letter 𝑇 suggests that whatever glyph is low and goes to right of 𝑇 usually needs less gap than if the following glyph has the height of, say, π‘₯ or even β„Ž. (Of course, this is just the usual case based on the form of the glyphs, whereas the semantics of what is being typeset may require adding or removing horizontal space.) To reproduce the tight spacing for the input stated, I needed an extra machine with a stock Debian TeX Live 2023 having babel 3.84 and italian 2022/03/07 v.1.4.07. Alternatively, use an older TeX Live (I tested 2020).

AlMa1r commented 3 months ago

After more testing attempts with \showoutput, I found out that the output produced with the stock Debian TeX Live 2023 (and hence slightly older packages) is wrong on the comma (it is wrongly taken from cmm) but, in the eyes of the user (and not necessarily in the eyes of lualatex), is correct on producing no kern between 𝑇 and the comma.

With the current up-to-date TeX Live 2023, it's the other way round: the comma is correctly taken from TeX Gyre Termes Math, but there is a useless \kern1.05 (italic) between 𝑇 and the comma. This seems to be independent of babel or babel-italian. If you also see it this way, please feel free to close the bug report. (After all, reproducibility on my side is not reliable, so I might be very wrong.)

FrankMittelbach commented 3 months ago

It comes down to a font design decision (or a font bug) as far as I can tell.

There is no "useless kern". What you see is the italic correction of the character T and that gets inserted due to rule 17 of math formula processing (TeXbook 445): The current math atom is a simple symbol (i.e. T) and \fontdimen2 of the font is 0pt (which it is) then add an italic correction which is the kern you see, hence called (italic) by luatex. If you use $TT,$ you see the same kern also after the first T.

Now if you drop all the fontsetting (and unicodemath) and use the default fonts (which is Computer modern math) then the situation changes slightly, you then see

....\OML/cmm/m/it/10 T
....\kern1.3889 (italic)
....\OML/cmm/m/it/10 T
....\kern1.3889 (italic)
....\kern-0.55556 (font)
....\OML/cmm/m/it/10 ;

i.e. there is an extra negative kern labeled (font) that undoes most of the italic correction. That appears to be a kerning specification in cmmi10.tfm and in fact it is. We do find there:

(CHARACTER C T
   (CHARWD R 0.584376)
   (CHARHT R 0.683332)
   (CHARIC R 0.13889)
   (COMMENT
      (KRN O 75 R -0.027779)
      (KRN O 73 R -0.055555)
      (KRN O 72 R -0.055555)
      (KRN O 177 R 0.083336)
      )
   )

Octal 73 is our "," 72 would be "." and 75 a "/".

Such a kerning correction is not present in TeX Gyre Termes hence the wider spacing. It is also not present in Latin Modern Math and one could argue that this is a font bug as Latin Modern attempts to provide a similar appearance as Computer Modern.

AlMa1r commented 3 months ago

… \fontdimen2 of the font is 0pt (which it is)

It is not:

\documentclass[italian,ngerman]{article}
\usepackage[italian,main=ngerman]{babel}
\usepackage{unicode-math}
\setmathfont{TeX Gyre Termes Math}
\begin{document}
\((T
\the\fontdimen2\font
,{\le})\)
\end{document}

yields β€œ(𝑇 3.33𝑝𝑑, ≀)”, and 3.33pt β‰  0.

FrankMittelbach commented 3 months ago

You can't query the current font this way in a math formula. What you test there is really what the fontdimen2 of lmr is. At the point \the\fontdimen2 is executed the math formula has not been converted from atoms to fonts glyphs so it is the outer font that you see. Try \showthe\fontat this point. Or alternatively put

\fontencoding{OML}\fontfamily{cmm}\selectfont

before the whole formula and see what is happening.

AlMa1r commented 3 months ago

@FrankMittelbach Thx; I see!

As for adapting to the italic correction, if we repair the font locally with fontforge (say, for \mathitalicsmode=2), what would good pairwise kerns for TeX Gyre Termes Math be? -28 for β€œπ‘‡/”, -56 for β€œπ‘‡,”, -56 for β€œπ‘‡.”, and 83 for β€œπ‘‡β€β€? I have no intuition on how to choose the values exactly.

jbezos commented 3 months ago

I’m closing this issue because it’s not directly related to babel. By the way, \the\fontdimen2\font returns the value of the current text font.

FrankMittelbach commented 3 months ago

@FrankMittelbach Thx; I see!

As for adapting to the italic correction, if we repair the font locally with fontforge (say, for \mathitalicsmode=2), what would good pairwise kerns for TeX Gyre Termes Math be? -28 for β€œπ‘‡/”, -56 for β€œπ‘‡,”, -56 for β€œπ‘‡.”, and 83 for β€œπ‘‡β€β€? I have no intuition on how to choose the values exactly.

This is really a font design question and also a matter of taste. In my opinion this is would be best addressed by the TeX Gyre designers rather than individuall adjusted. So I suggest you bring that to their attention and see if they are willing to make adjustments.

Please also note that given the license of the fonts you should probably not alter them without renaming them. Even if you do it just for yourself, such changes tend to spread over time even if this wasn't the original intention and it is just not good if different installations produce different results just because some font metrics have been altered.