latex3 / tagpdf

Tagging support code for LaTeX
59 stars 7 forks source link

first observations with tagging (meta) #57

Closed hpvd closed 1 year ago

hpvd commented 2 years ago

Because of being very curious about tagging and accessibility of pdfs, just activated

\DocumentMetadata
{
testphase = tagpdf, % load + activate   
}

everything with default settings and latest miktex

-> compilation of document without any problems :-)

Two observations (meta):

Are these points which could/will be addressed with ongoing maturity?

Of course I'm aware of that more functionality always "costs" :-) imho the points above are not minor but somehow important regarding future adaption by the users (if one wants to set a new standard in pdf quality: all pdf made by latex are tagged!)

FrankMittelbach commented 2 years ago

Some aspects are in the nature of the beast and others depend a little on the engine you used --- and, of course, right now the focus is mainly on functionality, premature optimization being the root of all eval :-).

First of all, producing tagged pdf, means you add a lot of additional structure into the pdf and so obviously that means an increase in size. In case of pdftex as the engine the situation is made worse by the fact that, normally TeX does not put spaces into its output but moves the writing position, but to have valid tagged and accessible pdf you really need real spaces instead (or in addition). To enable that pdftex offers a somewhat strange way using some dummy font containing only a space char and that alone blows up the pdf considerably. LuaTex has a better way and there you see much less grows.

Concerning speed, yes some improvements are certainly possible (but right now this is not our focus), but again, this partly depends on the engine (here with the downside that LuaTeX again does better but is slower from the start).

hpvd commented 2 years ago

Many thanks for giving these details!

Most important point for me was that one has these two topics on the list.

For my example, I was already using lualatex...

Maybe one key is setting adequate defaults which things will be tagged (and which not) If users change these defaults, they are probably aware of that there may be some costs appearing for this topic.

e.g maybe one can prevent things like this when using defaults: when using\usepackage{showhyphenation} the size of the document increases by a factor of 2.5 and compilation time even more...probably because all these tiny hyphenation markers were analysed and tagged... (yeah of course this is only used during visual debugging, so only speed is relevant...)

u-fischer commented 2 years ago

Maybe one key is setting adequate defaults which things will be tagged (and which not)

A tagged pdf must be tagged everywhere. You can not tag only one page and leave another out.

when using\usepackage{showhyphenation} the size of the document increases by a factor of 2.5 and compilation time even more...probably because all these tiny hyphenation markers were analysed and tagged... (yeah of course this is only used during visual debugging, so only speed is relevant...)

Well yes. All the hyphenation markers are artifacts and so split the paragraph in lots of small chunks. The first paragraphs of an english lipsum text has then 69 chunk, instead of one.

car222222 commented 2 years ago

@hpvd wrote: setting adequate defaults which things will be tagged (and which not)

Please can you suggest examples of the types of things that you think should "not be tagged by default"? Also, are you aware that ISO standards are currently being developed to mandate or recommend what must be tagged?

hpvd commented 2 years ago

@hpvd wrote: setting adequate defaults which things will be tagged (and which not)

Please can you suggest examples of the types of things that you think should "not be tagged by default"? Also, are you aware that ISO standards are currently being developed to mandate or recommend what must be tagged?

was thinking about not tagging things by default which

but only if this really helps on compilation speed and document size...

thanks for pointing to up coming Iso standard. Was only aware of some standards in field of websites accessibility which somehow works partly into the same direction (a11y, ADA, WAI...) When looking into more details like the different resources on https://www.pdfa.org/resource/tagged-pdf-best-practice-guide-syntax/ yeah it is really a complex topic... sorry I only looked into it from the perspective of a standard user....

u-fischer commented 2 years ago

has very special purpose, like the hyphenation markers from the example above

They are not "tagged" only marked as artifacts. But they have an effect nevertheless. Without the marker a paragraph can basically be tagged as one large chunk:

<p> ... paragraph ....</p>

With the marker one gets lots of small pieces:

<p>para</p> marker <p>graph</p> ....

And all this pieces muss be managed and recorded.