Open mbertucci47 opened 4 months ago
Not sure what we can do here. pdftex doesn't insert space chars at font switches:
\pdfcompresslevel0
\pdfobjcompresslevel0
\font\test=cmss10
\pdfinterwordspaceon
text text {\test cmss cmss} text text
\bye
Not sure if this a bug or a restriction, so I did sent a message to the pdftex mailing list. One can insert the space chars manually with \pdffakespace
, but it wouldn't be easy to detect in font commands if a space is wanted or not (and even more difficult if font switches are used).
Can't test right now, but what happens to 'text \mbox{}' ? Is that space also lost?
A ueseful question, but what should happen to an empty mbox in the PDF: should it be treated exactly as a word?
Extra information request:
What happens if there is a font change without a group?
text \test cmss cmss
text \mbox{}. % space char between text and .
text \mbox{text} % space char between text and text
text \mbox{\itshape text} % no space char between text and text
text \mbox{} \par % no space char after text
text text \test cmss cmss % no space char between text and cmss
So there is definitely a problem with losing spaces when there is a font change.
Maybe try putting in other "non-text stuff" to see whether it is more general than on only font changes --
Example:
{ cmss \count 234=32 text }
Also, is it only pdftex, or do the spaces also get lost using luatex?
The example with \par
is really strange compared with the other \mbox
examples!
The example with \par is really strange compared with the other \mbox examples!
Not really. Obviously the \mbox
is irrelevant, only the text and font changes matters and so text \mbox{}\par
is not different to text \par
Also, is it only pdftex, or do the spaces also get lost using luatex?
luatex is not affected, there the space chars are not inserted by the engine but with our lua code.
Aha! I had not known that "we" do not use luatex to make these magic space chars.
@u-fischer What does the tagging code do that makes the spaces disappear with it but appear without it? In copy/paste, I mean
@mbertucci47 Well this is a question for the maintainer of the reader. But imho without tagging the reader will use an heuristic to insert spaces between words: if the distance is large enough it will guess that this is a word space. This normally works quite ok, but can fail if the word spaces are small. With tagging the real spaces are relevant and decide if there is a word space or not.
@mbertucci47 for tagged pdf inter word spaces must have in the stream actual space characters U+0020 not as classically set by tex just have the words be placed by coordinate and spaces being implicit.
pdftex has a built in mechanism to "overprint" the word spaces by a space character while preserving the implicit spacing, but as this is a pdftex primitive behaviour when (as you show) it misses some word spaces, there is not a lot latex can do about it (other than report the problem upstream)
I see, thanks for the info
well, while it can be argued to be a bug in pdftex (and perhaps that needs followup) we can probably get it fixed with something like
\def \DeclareTextFontCommand #1#2{%
\DeclareRobustCommand#1[1]{%
\ifmmode
\nfss@text{#2##1}%
\else
\hmode@bgroup
\unless\ifdim \the\fontdimen2\font < \lastskip
\pdffakespace
\fi
\text@command{##1}%
#2\check@icl ##1\check@icr
\expandafter
\egroup
\fi
}%
}
and some check in \maybe@ic
to see if the following char is a space and in that case also add a fake space.
Doing something similar for straight font changes using switches, e.g., ...\itshape .... \rmshape...
could be possible too but is probably more fragile
@FrankMittelbach \xspace
strikes again:-) we could catch some cases that way but @u-fischer's tests such as
text \mbox{\itshape text} % no space char between text and text
shows the difficulty of picking this up at the macro layer, I don't see how \itshape
can look back and fix the space outside the current box.
@FrankMittelbach
\xspace
strikes again:-) we could catch some cases that way but @u-fischer's tests such as
text \mbox{\itshape text} % no space char between text and text
shows the difficulty of picking this up at the macro layer, I don't see how
\itshape
can look back and fix the space outside the current box.
well, \mbox
can perhaps do that in that case. As far as I can see having unnecessary \pdffakespace
s around (in a row) doesn't matter (or does it?) and if not \mbox
could make the same test and inserts such a faked space in front of itself if it is preceded by a space.
However, I think it is really something to ask Thanh if it can't be fixed in pdfTeX proper.
Did anyone else see a whole slew of irrelevant ideas from me?? Not sure where they came from, or how they got posted here!
Removed now, I hope permanently.
you mean that "Hello!recall some of the many deficiencies..."? Yes that showed up in my inbox. Or anything else?
Yes, Frank, that one!
With pdflatex, spaces are lost around
\text<xx>
commands at the pdf level, by which I mean in the tags and when copying/pasting. Here's an example.When you copy and paste the text from the pdf you get
With lualatex this does not occur, nor does it occur if tagging code is not loaded.