latex3 / luaotfload

OpenType font loader for LuaTeX
Other
59 stars 6 forks source link

c2sc OpenType feature typesets improper codepoints #273

Open YellowJacketLinux opened 8 months ago

YellowJacketLinux commented 8 months ago

I first reported this bug to fontspec https://github.com/latex3/fontspec/issues/497 but was told it's an engine issue.

I hope this is the right place.

I personally don't consider this high priority because things at least visually work.

The c2sc OpenType feature is supposed to use the small-caps variant of the lower case letter where the upper case letter is requested, but the unicode code-point should still be for the upper-case variant so that copy and paste still produces an upper-case letter regardless of font features in the document being pasted into.

See the MWE and copy/paste the strings into a text editor.

In the MWE overline example, I didn't use Greek for c2sc because TeX Gyre Termes doesn't have small-caps for Greek.

But the overline example shows how typographically better it is to use c2sc with nomem sacrum (especially if the small-caps are actually a little taller than x-height although that's not shown).

If using a font with small-caps for Greek, one could even use the U+0305 combining character to make the overline (note TeX Gyre also doesn't have U+0305 but some Greek/Coptic fonts do as both scripts historically use it frequently) so that even the overline itself itself is copied and pasted---but since LuaLaTeX is using lower-case codepoints with c2sc, what gets pasted would be lower-case letters and not the upper-case that nomina sacra traditionally use.

Even though things visually work, it's possible that the engine using lower-case code points creates an issue for screen readers too, but in this use case (abbreviations) a text alt-tag should probably be used anyway, so perhaps it's not an accessibility issue but in some use cases it actually might be.

The MWE:

\RequirePackage{fontspec}
\documentclass[letterpaper,fontsize=14pt]{scrarticle}

\setmainfont
  [ Ligatures   = TeX ,
    Extension   = .otf ,
    UprightFont = *-regular ,
    BoldFont = *-bold ,
    ItalicFont = *-italic ,
    BoldItalicFont = *-bolditalic ]
  {texgyretermes}
\setsansfont
  [ Ligatures   = TeX ,
    Extension   = .otf ,
    UprightFont = *-regular ,
    BoldFont = *-bold ,
    ItalicFont = *-italic ,
    BoldItalicFont = *-bolditalic ]
  {texgyreheros}
\setmonofont
  [ Ligatures   = NoCommon ,
    Extension   = .otf ,
    UprightFont = *-regular ,
    BoldFont = *-bold ,
    ItalicFont = *-italic ,
    BoldItalicFont = *-bolditalic ]
  {texgyrecursor}

\makeatletter
\newcommand*{\textoverline}[1]{$\overline{\hbox{#1}}\m@th$}
\makeatother
% \symbol{"0305}

\usepackage[colorlinks=true]{hyperref}

\begin{document}
\section{Stuff}
Herod the Great died at around
4~{\fontspec[Letters=UppercaseSmallCaps]{texgyretermes-regular.otf}B.C.E.}\
but Quirinius did not become governor of Syria until
6~{\fontspec[Letters=UppercaseSmallCaps]{texgyretermes-regular.otf}C.E.}\
which means the tax of Quirinius did not happen until at least ten years after Herod the Great died.

This paragraph shows how the bug impacts my intended purpose of using the c2sc feature, which has to
do with Byzantine-era Greek and \textit{nomina sacra}---the practice of abbreviating holy names. Compare
\textoverline{ΔΑΔ} with
\textoverline{\fontspec[Letters=UppercaseSmallCaps]{texgyretermes-regular.otf}DAD}
and notice the text overline on the first is much closer to the text in the line above it, creating a
visual typography issue hence the need for c2sc.

\end{document}
zauguin commented 8 months ago

This is basically what I commented on at https://tex.stackexchange.com/questions/707772/xelatex-fontspec-stylisticset-changes-underlying-unicode-characters-in-the-t#comment1759919_707772 recently. This is rather hard to avoid unless we very fundamentally change how we output mappings to Unicode like we do in harf mode. We might have to consider doing that though, then we might want to move it out of the mode specific part and make parts of it generic. This will probably require rather heavy patching of the ConTeXt fontloader. @u-fischer I'm guessing these things will become rather important from a tagpdf point of view?

@YellowJacketLinux For now you can avoid the issue by using HarfBuzz mode (by adding Renderer=HarfBuzz in fontspec).

YellowJacketLinux commented 8 months ago

I can confirm the issue does not exist with Renderer=HarfBuzz

Thank you.