jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.14k stars 3.3k forks source link

Soul underlining gobbles some characters if --pdf-engine=xelatex #9920

Open kopeckyf opened 4 days ago

kopeckyf commented 4 days ago

The usage of \ul from soul for underlining can be problematic if the PDF output is generated using XeLaTeX (even with recent releases from 2024). soul's font does not provide some characters outside the base Latin character set, like the Hungarian ő (Latin small o with double acute) or stacked diacritics like ȭ (o with tilde and macron). Stacked diacritics appear in many diverse languages around the world. These characters not available in soul will then be gobbled and erroneously removed from the output.

LaTeX's primitive \underline as well as \uline from ulem do not have these issues.

There are several solutions I could imagine:

kopeckyf commented 4 days ago

The issue extends to other commands supplied by soul. So, \st{ȭ} does not output anything but \sout{ȭ} from ulem would output the desired striked o with tilde and macron.

(I tested this on XeTeX, Version 3.141592653-2.6-0.999996 (TeX Live 2024) loading soul 2023-06-14 v3.1)

kopeckyf commented 4 days ago

ulem was originally replaced by soul (https://github.com/jgm/pandoc/commit/144bf90ab92b517dd721baf80f121f86187ccd61) to enable hyphenation and optimise linebreaking in underlined or striked text (#8411). So it is questionable whether re-instating ulem is the best approach.

Maybe it should be fixed within soul instead? But there already has been an attempt at making it Unicode-friendly, which does not seem to have solved all issues...

kopeckyf commented 4 days ago

The soul documentation provides more details about the cause of the problem:

The soul-ori package uses the ectt1000 font while it analyzes the syllables. This font is used, because it has 256 mono-spaced characters without any kerning. It belongs to J ¨org Knappen’s EC-fonts, which should be part of every modern TEX installation. If TEX reports “I can’t find file ‘ectt1000’” you don’t seem to have this font installed. It is recommended that you install at least the file ectt1000.tfm which has less than 1.4 kB. Alternatively, you can let the soul-ori package use the cmtt10 font that is part of any installation, or some other mono-spaced font

I'm not sure whether soul will receive a different default instead of ectt1000 so that things could work in XeLaTeX.

Instead, I wonder whether pandoc could intelligently change soul's font. A very quick approach could be

$if(strikeout)$
$-- also used for underline
\ifLuaTeX
  \usepackage{luacolor}
  \usepackage[soul]{lua-ul}
\else
  \usepackage{soul}
  \let\SOUL@tt\normalfont % <----

One could also do something like this:

$if(mainfont)$
\ifXeTeX
\setfontfamily\SOUL@tt{$mainfont$}[$for(mainfontoptions)$$mainfontoptions$$sep$,$endfor$]
...

This would solve part of the issue. But a more intelligent setting would be needed in case a phrase is underlined that can not be printed in the document's $mainfont$ (suppose I'm writing an English text but want to underline a phrase in a non-Latin script).

jgm commented 3 days ago

Not sure. Have you tried contacting soul's maintainer about the issue for suggestions? The font changing idea seems promising but I am too ignorant about what is going on in soul to make the change confidently.

For the near future, I think a viable workaround is to use lualatex if you're writing Hungarian. (Does --pdf-engine=lualatex work?)