schlcht / microtype

The microtype package
https://ctan.org/pkg/microtype
LaTeX Project Public License v1.3c
83 stars 4 forks source link

Microtype tracking inserts spaces in small-caps text representation #41

Open Marcool04 opened 3 days ago

Marcool04 commented 3 days ago

Description

It appears that when tracking is switched on, then the text representation of small capitals gets extra spaces inserted inside words. This results in broken search and copy-paste.

Tested with pdfLaTeX and LuaLaTex (XeLaTex doesn't provide tracking feature). Tested in several viewers too (screenshots below are from pdf.js in Firefox as it has the "highlight all" feature in search which shows the problem nicely).

I'm on Arch Linux using texlive-bin version 2024.2-3.

At first I thought this was a conflict with biblatex/biber as that is where I had small caps, and it was not occurring for me outside of the bibliography. But after I posted a question on tex.stackexchange.com, another user commented that they observe the breakage even outside of a bibliography. I then tested it in Overleaf, and I've arrived at the following conclusions:

              Paragraph        Biblio
pdfLaTex          ❌             ❌
lualatex          ✔️             ❌

Minimal example demonstrating the issue

\documentclass{scrbook}

\usepackage[english]{babel}
\usepackage[%
  babel=true,%
  tracking=true
]{microtype}
\usepackage{csquotes}
\usepackage[%
  backend=biber,
]{biblatex}
\begin{filecontents}[overwrite]{biblio.bib}
@book{saussureCourseGeneralLinguistics1959,
  title = {Course in General Linguistics},
  author = {family=Saussure, given=Ferdinand, prefix=de, useprefix=false},
  date = {1959},
  publisher = {Philosophical Library},
  location = {New York},
  isbn = {978-0-231-15726-1},
  langid = {english},
}
\end{filecontents}
\addbibresource{biblio.bib}

\renewcommand*{\mkbibnamefamily}[1]{\textsc{#1}}

\begin{document}
\nocite{saussureCourseGeneralLinguistics1959}

An ordinary small-caps word with tracking on: \textsc{Saussure}

\vspace{1em}

The bibliography with tracking on:

\printbibliography[heading=none]{}

\vspace{1em}

\microtypesetup{tracking=false}

An ordinary small-caps word with tracking off: \textsc{Saussure}

The bibliography with tracking off:
\printbibliography[heading=none]{}

\end{document}

Using LuaLaTeX: image

Using pdfLaTeX: image

Marcool04 commented 2 days ago

Ok so a bit of a follow up after I learned a lot from the discussion in stackexchange about how internal PDF reprensentation of text functions: it would appear that there is some kind of heuristics going on in PDF renderers (all of them? presumably) to decide what is and what is not a space. I must say I'm flabbagastered at this: I thought that the "text layer" (I know that's not exactly a correct metaphor but anyhow) was rigorously that which was created, and the "visible" layer, with stuff like font decorations, ligatures, etc. etc. was somewhat independent from it. That, apparently, is not the case.

So this leaves us, as far as this issue goes, with the question: what should Microtype do? I feel like it's at the very least something that should be mentioned in the docs under the tracking heading (if you like I could try to make a PR for that). I think in the longer term it might be a good thing to conduct a few tests with different PDF viewers to determine what the cutoff is for rendering two letters separated by tracking as having a space between them, and then setting Microtype's default tracking setting to something below that, out of safety. I'm not sure how that would look typographically (I get that the whole point of tracking is that those letters that are getting tracked are supposed to look better when that is done, in particular small caps). I guess the tradeoff is then between visual aspect and "copy-pasteability" or "searchability"...

Would love to hear your thoughts @schlcht

schlcht commented 1 day ago

yes, I saw your question on tex.sx (I'm the "Robert" who commented there). PDF is primarily designed to provide a visual representation, a "page description", whereas the structure of the text is somewhat more of an afterthought (which, however, has recently gained some attention with the possibilities of tagging). Therefore, spaces in PDF are only relevant in terms of the shift seen on the "paper", not in their semantic qualities (whether it's just kerning, or a single interword space, two spaces, a tab, whatever...). And that's why all viewers have to apply these heuristics, which inevitably may fail, as it does in your case. (BTW. I've tried with a couple of viewers here (on a Mac), and the only one that gets tricked by these inner spaces is Firefox's pdf viewer -- but it's nevertheless easy to reproduce, I just had to increase the letterspacing amount.)

As I mentioned on tex.sx, you could use the accsupp package, but that, too, would be viewer-dependent, as many viewers simply ignore alternate text. So I'm afraid there's really no satisfactory solution, other than reducing the letterspacing amount and hoping for the best...

With regards to microtype, I think I will indeed change the default letterspacing. The number 100 has always been just a random number that I've never been happy with. Something like 40 or 50 max would make much more sense. So thanks for finally giving me a good reason to change this default. ;-)

PS. While experimenting with your MWE, I found a bug in pdftex, where even the engine itself is tricked into thinking that the interletter spaces from letterspacing are "real" ones.

Marcool04 commented 1 day ago

Oh hi @schlcht / Robert :) Sorry for duplicating the discussion then. Fascinating stuff. Good catch with pdftex! Slowly slowly, maybe some day PDF will gain a "true" utf-8 enabled text representation alongside the visual "layer"? Who knows. Anyhow thanks for your input and reactivity, and feel free to close this or leave it open until you decide on a new default or whatever feels best to you. I'm grateful for your time anyhow.

Best,

Mark.