Closed dbitouze closed 3 years ago
If you do copy & paste of an untagged pdf the reading order is decided by the pdf viewer with some heuristics. In this case your viewers probably consider the alignment as an indication of two column mode and so resorts the text.
As I don't want to remove the nice aligment, you will have to wait for the progress of the tagged PDF project to get a better result here ;-). But even with a fully tagged pdf I wouldn't fully trust copy & paste. Imho pdf viewer don't care much about code and so sometimes drop spaces or remove new lines.
But even with a fully tagged pdf I wouldn't fully trust copy & paste. Imho pdf viewer don't care much about code and so sometimes drop spaces or remove new lines.
Do you mean you expect the readers of (LaTeX) documentations to type by hand all the source codes they want to test (with, among other, the misspelling risks), in the above example 350 characters?! This is not very engaging ;)
I expect readers of LaTeX documentation to know that there is a source ... ;-)
But beside this: imho it is more reliable to embed/attach code as file, so in a tagged pdf I would try to add it as associated file.
BTW, the trouble can be avoided with the package listings
(and the columns=flexible
option):
\documentclass{article}
\usepackage{listings}
\lstset{basicstyle=\ttfamily,columns=flexible}
\begin{document}
\begin{verbatim}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\end{verbatim}
\begin{lstlisting}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\end{lstlisting}
\end{document}
Yes, but this destroys the alignment. That was what I meant above,
(Tested on Linux with several PDF readers: Zathura, Okular and Evince.)
shows that all the readers you use apply strange heuristics. Here is what I get (pdfexpert is not perfect, but reasonable):
\pdfdict_new:n {l_my_action_dict} \pdfdict_put:nnn {l_my_action_dict}{Type}{/Action} \pdfdict_put:nnn {l_my_action_dict}{S}{/URI} \pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\pdfannot_dict_put:nnn {link/URI} { C } {[1~0~0]} %red border
\pdfannot_link:nxn { URI }
{ /A <<\pdfdict_use:n{l_my_action_dict}>> } { link text }
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\pdfannot_dict_put:nnn
{link/URI} { C } {[1~0~0]} %red border
\pdfannot_link:nxn { URI }
{
/A <<\pdfdict_use:n{l_my_action_dict}>>
}
{ link text }
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\pdfannot_dict_put:nnn
{link/URI} { C } {[1~0~0]} %red border
\pdfannot_link:nxn { URI }
{
/A <<\pdfdict_use:n{l_my_action_dict}>>
}{
link text }
all the readers you use apply strange heuristics
If I'm right, all of them are poppler
based. I'll open a bug report there.
If I'm right, all of them are
poppler
based. I'll open a bug report there.
Considered as a priori non-fixable by a poppler
developer :frowning_face:
Well, as Albert Astals Cid wrote:
"getting text from PDF files is a guessing game".
This applies both locally (what characters to paste) and globally (what to include and in what order).
I have a large menagerie of strange examples of "interesting results from copy-and-paste".
I have a large menagerie of strange examples of "interesting results from copy-and-paste".
in what format do you have them? They could be valuable as test cases
If I'm right, all of them are
poppler
based. I'll open a bug report there.Considered as a priori non-fixable by a
poppler
developer ☹️
interesting, given that all other viewers get their heuristic right. But he is right of course in the sense that without approprite internal tagging guess will always have edge cases where they fail.
On 6/23/21 10:21 AM, Frank Mittelbach wrote:
If I'm right, all of them are |poppler| based. I'll open a bug report there. Considered as a priori non-fixable by a |poppler| developer <https://gitlab.freedesktop.org/poppler/poppler/-/issues/1093#note_968375> ☹️
interesting, given that all other viewers get their heuristic right. But he is right of course in the sense that without approprite internal tagging guess will always have edge cases where they fail.
What's an appropriate tagging for this? I tried simply adding some \pdffakespace to the definition of @.***, with the example code from the pdftex documentation, and this still failed to copy-paste properly.
Bruno
@blefloch something like this should work, at least with the reading order (but I can't test if the affected pdf viewer actually understand tagged pdf)
\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{uncompress}
\documentclass{article}
\usepackage{listings}
\lstset{basicstyle=\ttfamily,columns=flexible}
\usepackage{tagpdf}
\tagpdfsetup{activate-all,interwordspace=true,paratagging}
\begin{document}
\tagstructbegin{tag=Document}
\tagstructbegin{tag=Code}
\begin{verbatim}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\end{verbatim}
\tagstructend
\tagstructbegin{tag=Code}
\begin{lstlisting}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\end{lstlisting}
\tagstructend
\tagstructend
\end{document}
something like this should work, at least with the reading order (but I can't test if the affected pdf viewer actually understand tagged pdf)
On Linux with Zathura
, Okular
and Evince
, still:
@dbitouze then you could make a new bug report and ask if they support tagged pdf (at best compile with lualatex and at least twice just to be sure the structure is right and there ...) And you need a current latex, paratagging works only with it.
@dbitouze then you could make a new bug report and ask if they support tagged pdf (at best compile with lualatex and at least twice just to be sure the structure is right and there ...)
I'll do it and let know here.
And you need a current latex, paratagging works only with it.
Is it OK with:
$ lualatex test
This is LuaHBTeX, Version 1.13.2 (TeX Live 2021)
restricted system commands enabled.
(./test.tex
LaTeX2e <2021-06-01> patch level 1
L3 programming layer <2021-06-18>
and:
*File List*
pdfmanagement-testphase.sty 2021-06-14 v0.95e LaTeX PDF management testphase bundle
pdfmanagement-testphase.ltx 2021-06-14 v0.95e PDF management code (testphase)
l3bitset.sty 2021-05-27 L3 Experimental bitset support
expl3.sty 2021-06-18 L3 programming layer (loader)
l3backend-luatex.def 2021-05-07 L3 backend support: PDF output (LuaTeX)
l3backend-testphase-luatex.def 2021-06-14 LaTeX PDF management testphase bun
dle backend support:PDFoutput(LuaTeX)
l3ref-tmp.sty 2020-10-09 L3 Experimental cross-referencing
pdfmanagement-firstaid.sty 2021-06-14 v0.95e LaTeX PDF management testphase
bundle / firstaid-patches
article.cls 2021/02/12 v1.4n Standard LaTeX document class
size10.clo 2021/02/12 v1.4n Standard LaTeX file (size option)
listings.sty 2020/03/24 1.8d (Carsten Heinz)
keyval.sty 2014/10/28 v1.15 key=value parser (DPC)
lstmisc.sty 2020/03/24 1.8d (Carsten Heinz)
listings.cfg 2020/03/24 1.8d listings configuration
tagpdf.sty 2021-06-14 v0.82 A package to experiment with pdf tagging
etoolbox.sty 2020/10/05 v2.5k e-TeX tools for LaTeX (JAW)
tagpdf-luatex.def 2021-06-14 v0.82 tagpdf driver for luatex
tagpdf-checks-code.sty 2021-06-14 v0.82 part of tagpdf - code related to che
cks and messages
tagpdf-user.sty 2021-06-14 v0.82 tagpdf - user commands
tagpdf-tree-code.sty 2021-06-14 v0.82 part of tagpdf - code related to writi
ng trees and dictionaries to the pdf
tagpdf-roles-code.sty 2021-06-14 v0.82 part of tagpdf - code related to role
s and structure names
tagpdf-attr-code.sty 2021-06-14 v0.82 part of tagpdf - code related to attri
butes and attribute classes
tagpdf-mc-code-shared.sty 2021-06-14 v0.82 part of tagpdf - code related to
marking chunks - code shared by generic and luamode
tagpdf-mc-code-lua.sty 2021-06-14 v0.82 tagpdf - mc code only for the luamod
e
tagpdf-struct-code.sty 2021-06-14 v0.82 part of tagpdf - code related to sto
ring structure
tagpdf-space-code.sty 2021-06-14 v0.82 part of tagpdf - code related to real
space chars
ts1cmr.fd 2019/12/16 v2.5j Standard LaTeX font definitions
***********
Interestingly, the test file provided by Ulrike fails as I said, since it is pasted as:
\pdfdict_new:n
\pdfdict_put:nnn
\pdfdict_put:nnn
\pdfdict_put:nnn
{l_my_action_dict}
{l_my_action_dict}{Type}{/Action}
{l_my_action_dict}{S}{/URI}
{l_my_action_dict}{URI}{(https://www.latex-project.org)}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
but the following one:
\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{uncompress}
\documentclass{article}
\usepackage{listings}
\lstset{basicstyle=\ttfamily,columns=flexible}
\usepackage{tagpdf}
\tagpdfsetup{activate-all,interwordspace=true,paratagging}
\begin{document}
\section{First code} % <-- Here is the 1st difference
\tagstructbegin{tag=Document}
\tagstructbegin{tag=Code}
\begin{verbatim}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\end{verbatim}
\tagstructend
\section{Second code} % <-- Here is the 2nd difference
\tagstructbegin{tag=Code}
\begin{lstlisting}
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
\end{lstlisting}
\tagstructend
\tagstructend
\end{document}
is less wrong:
1 First code
\pdfdict_new:n
{l_my_action_dict}
\pdfdict_put:nnn
{l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn
{l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn
{l_my_action_dict}{URI}{(https://www.latex-project.org)}
2 Second code
\pdfdict_new:n {l_my_action_dict}
\pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
\pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
\pdfdict_put:nnn {l_my_action_dict}{URI}{(https://www.latex-project.org)}
is less wrong:
Well... only with Okular
(still fails with Zathura
and Evince
).
Is it OK with:
Yes, should be fine. You could also add the option "paratagging-show", if you get lots of small red numbers paratagging works. Or you could check if there are tags at https://www.ngpdf.com/.
is less wrong:
well heuristics are heuristics. The reader are trying to guess if it is a two column document or not, an naturally everything on the page is taken into account.
I don't know whether
pdfmanagement-testphase
is the culprit or not ;) but, anyway: copying the code between "Or through a dictionary:" and "Or if you want to exclude the possibility [...]" page 3 of thel3pdfannot
module's documentation:is pasted as:
(Tested on Linux with several PDF readers: Zathura, Okular and Evince.)