Closed TCWORLD closed 3 years ago
At start a warning: as the documentation says, tagpdf is an experimental package. And I mean this. Don't expect interfaces or behaviour to be stable. The current development is going on in the splitting branch (which I will merge at some time back into develop). This branch requires the splitting branch of the pdfresources project in the latex github, along with the newest latex (and some code at the begin of the document to activate the pdfresource management).
That said, I'm always interested to get some feedback about what works and what not.
I would say that you found a problem with \pdfinterwordspaceon
. If you try an example without tagpdf, eg.
\RequirePackage{l3pdf}
\ExplSyntaxOn
\pdf_uncompress:
\ExplSyntaxOff
\documentclass{article}
\pdfglyphtounicode{space}{0020}
\begin{document}
\pdfinterwordspaceon
\noindent abc cde
\end{document}
and look in the pdf you can see that a space ( )
is inserted before the main displacement 133.768 707.125 Td
.
BT
/F21 9.9626 Tf/F30 1 Tf( )Tj/F21 9.9626 Tf 133.768 707.125 Td [(ab)-28(c)]TJ
....
I found no sensible way to avoid this. It happens even after an \noindent or \leavevmode\pdfinterwordspaceon
. only if some other char is printed first it worked.
With lualatex (which uses a quite different method) there is no problem.
Thanks for the quick response. I fully understand it's experimental and have no expectation of stability.
I've switched over the document to using LuaLaTeX (only required changing a couple of package includes, so not as bad as expected) and indeed it works perfectly with interwordspace on.
Once I've got what I need working, I'll share the wrapper I've written. it's a bit clunky but it might give some useful feedback as to how the package is being used.
@u-fischer Is it clear to you why/how such a misplaced space character leads to this exact wrong behaviour of the 'bounding box' as it appears in the reader?
@car222222 sure. The space char is in the lower left edge. So the reader is quite right to be confused by it (acrobat pro seems to ignore it)
Aha yes! And it is also 'within the paragraph'. I can picture it in my mind now. Thanks.
Interesting that Pro appears to interpret it differently. Should we tell someone about that difference?
A fix has been added to the pdftex sources, I tested with the updated binaries from w32tex.org, and it seems to work fine now.
I've been playing with tagpdf to see if we can make our LaTeX produced lab notes accessible, and I've been getting along generally OK with getting most of the parts tagged correctly (have written a load of wrappers for tagging various bits).
One thing that I have noticed as being a problem, is that when I select a different font (e.g. helvet package with all fonts set to sans) presumably due to font kerning, Adobe reader when it reads out the text sounds a bit like a Dalek as the PDF output seems to split words up into chunks - e.g. (some) becomes (som)(e) in the PDF stream. I will probably try using a different font to see if I can avoid that, but while investigating it flagged up another problem.
One way I've found to fix the weird splitting of words is to set
interwordspace=on
in the\tagpdfsetup
which indeed results in all the words being properly read. However it introduces a problem that the bounding box of the paragraph highlighed by the reader changes. An example is shown in the image below (the LaTeX code to produce it is at the end of the issue):Notice how in the left example when there is no
interwordspace
argument passed to\tagpdfsetup
the bounding box correctly positions itself around the paragraph to be spoken.Now when I pass in
interwordspace=on
(in factinterwordspace=
anything), suddenly the bounding box of the paragraph changes to start at the bottom left corner of the page.While this does not cause an issue for reading itself, it does mean whenever you click on a paragraph to read it, Adobe Reader scrolls down to the bottom of the page which is not ideal.
Is this an issue with
tagpdf
itself? Or something to do with the\pdfinterwordspaceon
primitive?Minimum Example Code: