pdf created by axessibility doesn't pass preflight test

u-fischer commented 6 years ago

When I compile the following with pdflatex

\documentclass{article}
\usepackage{axessibility}
\begin{document}
\[ a= b\]
\end{document}

I get a pdf which doesn't pass a preflight test due to syntax errors:

axesserrors

Looking in the pdf it is quite clear what is wrong: the style create a BDC operator with three arguments instead of two:

/S/Span<</ActualText(\040\040a=\040b)>>BDC

I don't quite understand why the /S key is inserted here, this is normally a key for a StructElem object.

integr-abile commented 6 years ago

Thank you for catching the arguments issue. as for /S, it is due to an issue with screen readers. Otherwise, the ActualText is not read

ozross commented 6 years ago

Can you try instead: /S MP/Span << .... which is valid PDF. (MP operator is "marked point") Does this trigger the screen reader for you?

Furthermore, since the content is mathematics, it would be better to use /Formula instead of /Span, and duplicate the /ActualText string also as /Alt text. Inspecting the NDVA source coding, it should be sufficient to be using use /Alt alone, once the validity of the PDF syntax has been sorted out. Would you please try this?

Hope this helps. Ross Moore
(Director, TeX Users Group)

u-fischer commented 6 years ago

If the suggestions of Ross work it would be better to implement them without patching and changing internal commands of Accsupp as this can introduce incompabilities with other uses of Accsupp.

MicheleBerra commented 6 years ago

Dear @u-fischer,about your last comment: would this be ok?

\newcommand*{\BeginAxessible}[1]{%
  \begingroup
    \setkeys{ACCSUPP}{#1}%
    \edef\ACCSUPP@span{%
     /S/Formula<<%
        \ifx\ACCSUPP@Alt\relax
        \else
          /Alt\ACCSUPP@Alt
        \fi
        \ifx\ACCSUPP@ActualText\relax
        \else
          /ActualText\ACCSUPP@ActualText
        \fi
      >>%
    }%
    \ACCSUPP@bdc
    \ACCSUPP@space
  \endgroup
}

and

\newcommand*{\EndAxessible}{%
  \begingroup
    \ACCSUPP@emc
  \endgroup
}

The wrapper becomes:

\long\def\wrap#1{
\BeginAxessible{method=escape,ActualText=\detokenize\expandafter{#1}, Alt=\detokenize\expandafter{#1} }
 #1
\EndAxessible%
}

If so, we will run some test tomorrow and update the code accordingly.

Thanks for your suggestion!

PS. We also included @ozross idea of having both /Alt and /ActualText. Unfortunately, the proposed suggestion /S MP/Span or /S MP/Formula does not trigger the screen reader.

u-fischer commented 6 years ago

That's certainly better than patching accsupp, but you are still producing invalid pdf.

I made a few short tests and can confirm the odd behaviour (also with the text-to-speech engine of adobe): the latex /Alt-Text is only read when you create a faulty pdf.

Reading the reference and after some more tests I suspect that the /Alt-Text is used only if the document if fully tagged.

MicheleBerra commented 6 years ago

Dear @u-fischer, we are working on the matter. We will keep you posted. Our main goal was to make the PDF readable to visually impaired people, and that (even with the error) works. We will use this issue as a feature request (or even as the starting point of a brand new implementation). Thanks a lot!

u-fischer commented 5 years ago

I made a few tests with /Alt and /ActualText and text-to-speech software. A sum up is here https://github.com/u-fischer/tagpdf/blob/master/source/examples/structure/ex-alt-actualtext.tex (it needs the new tagpdf.sty I uploaded yesterday to CTAN to compile, but the resulting pdfs from pdflatex and lualatex are also in the github folder).

ozross commented 5 years ago

Hi Ulrike,

On 8 Aug 2018, at 7:55 am, u-fischer notifications@github.com<mailto:notifications@github.com> wrote:

I made a few tests with /Alt and /ActualText and text-to-speech software. A sum up is here https://github.com/u-fischer/tagpdf/blob/master/source/examples/structure/ex-alt-actualtext.tex https://protect-au.mimecast.com/s/kkK0CJyBZ6tqDqmRcV5Z8s?domain=github.com (it needs the new tagpdf.sty I uploaded yesterday to CTAN to compile, but the resulting pdfs from pdflatex and lualatex are also in the github folder).

One of the comments that I read in the source file(s) was that some math is not encoded properly. Try also using one or other (or both) of

\input glyphtounicode-cmr.tex (from the pdfx package)

and

\usepackage{mmap}

The latter package is quite old, but it provides CMap resources for OT1 and T1-encoded math fonts. I have a later version which I can send you if needed. I’m working on a further update, hopefully to be usable also with XeTeX.

These packages and files were developed specifically to overcome problems encountered when generating documents to be compliant with PDF/A, particularly for the PDF/A-2 and PDF/A-3 levels, using pdfTeX.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://protect-au.mimecast.com/s/DyczCK1DOrC2G2vlhvirNs?domain=github.com, or mute the threadhttps://protect-au.mimecast.com/s/CBc0CL7Eg9fRZRpACP5PSq?domain=github.com.

Hope this helps.

Ross

Dr Ross Moore

Mathematics Dept | 12 Wally’s Walk, 734 Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695> M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ross.moore@mq.edu.aumailto:rick.minter@mq.edu.au

http://www.maths.mq.edu.au http://mq.edu.au/

[cid:image001.png@01D030BE.D37A46F0]http://mq.edu.au/

CRICOS Provider Number 00002J. Think before you print. Please consider the environment before printing this email.http://mq.edu.au/

This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University.http://mq.edu.au/

u-fischer commented 5 years ago

@ozross I know about mmap but didn't care for this tests. But beside this: one problem with mmap is that it doesn't work with other math fonts, e.g. newtxmath. The other that it only improves copy&paste of symbols. E.g.

\[\sum_{i=1}^n (2^x+2)  = \sqrt{abc}\]

copies as

\sum n
i=1
(2x + 2) =
\surd
abc

I don't see much use to get it working with xelatex. One can simply use unicode-math. Then one can copy unicode symbols (but don't get structure either):

𝑛Σ
𝑖=1
(2𝑥 + 2) =
√
𝑎𝑏𝑐

ozross commented 5 years ago

Hi Ulrike,

On 08/08/2018, at 19:07, "u-fischer" notifications@github.com<mailto:notifications@github.com> wrote:

@ozrosshttps://protect-au.mimecast.com/s/y6GcCANpnDCNzL8NF82yuQ?domain=github.com I know about mmap but didn't care for this tests. But beside this: one problem with mmap is that it doesn't work with other math fonts, e.g. newtxmath.

I can produce the cmap files for this. Just as I am doing now for MathTimePro-2.

The other that it only improves copy&paste of symbols. E.g.

[\sum_{i=1}^n (2^x+2) = \sqrt{abc}]

copies as

\sum n i=1 (2x + 2) = \surd abc

There are multiple modes. This one using macro names is just one of the modes.

I don't see much use to get it working with xelatex. One can simply use unicode-math.

People like mixing some of the features of XeTeX with legacy math fonts, such as Lucida and MathTime. There should be no reason to not do this, and such documents should be able to be made to satisfy PDF/A, and perhaps also PDF/UA.

Then one can copy unicode symbols (but don't get structure either):

𝑛Σ 𝑖=1 (2𝑥 + 2) = √ 𝑎𝑏𝑐

Sure. Structure in math is much, much harder. It should be done using MATHML tagging. But that is really hard, at present.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://protect-au.mimecast.com/s/t5ZmCzvkmpfMAQ7MC4rdWg?domain=github.com, or mute the threadhttps://protect-au.mimecast.com/s/kuAcCBNqgBC7Opy7fNOz0y?domain=github.com.

Cheers,

Ross

ozross commented 5 years ago

Hi again,

On 09/08/2018, at 0:09, "Ross Moore" ross.moore@mq.edu.au<mailto:ross.moore@mq.edu.au> wrote: On 08/08/2018, at 19:07, "u-fischer" notifications@github.com<mailto:notifications@github.com> wrote:

@ozrosshttps://protect-au.mimecast.com/s/y6GcCANpnDCNzL8NF82yuQ?domain=github.com I know about mmap but didn't care for this tests. But beside this: one problem with mmap is that it doesn't work with other math fonts, e.g. newtxmath.

Presumably the glyphs are named correctly, so the automated methods of \pdfglyphtounicode should work for this, and other modern fonts, except for maybe a few characters not yet listed in glyphtounicode.tex . So mmap.sty doesn't have much to add for recent fonts.

I can produce the cmap files for this. Just as I am doing now for MathTimePro-2.

MathTimePro, on the other hand, is much older. It does not name glyphs according to the actual character, so automated methods do not work. Instead it needs a CMAP constructed specially, to correctly identify each character. And it needs a method to associate the CMAP with the font dictionary within the PDF. This is the case with any driver: pdfTeX, XeTeX, or LuaTeX as well as dvips.

The other that it only improves copy&paste of symbols. E.g.

[\sum_{i=1}^n (2^x+2) = \sqrt{abc}]

copies as

\sum n i=1 (2x + 2) = \surd abc

I don't see much use to get it working with xelatex. One can simply use unicode-math.

People like mixing some of the features of XeTeX with legacy math fonts, such as Lucida and MathTime. There should be no reason to not do this, and such documents should be able to be made to satisfy PDF/A, and perhaps also PDF/UA.

Then one can copy unicode symbols (but don't get structure either):

𝑛Σ 𝑖=1 (2𝑥 + 2) = √ 𝑎𝑏𝑐

Sure. Structure in math is much, much harder. It should be done using MATHML tagging. But that is really hard, at present.

You have seen some of my earlier work producing documents in which each symbol is tagged with both /Alt and /ActualText. It was done in such a way as to build a natural language alternative description of mathematical expressions. This requires external software (e.g. Perl scripts) to combine a MathML translation capturing the structure, with the original TeX source of each math expression. This is work that I hope to be able to return to, for supporting PDF 2.0, and PDF/UA-2.

When we have this, blind users will be able to use MathPlayer as PDF browsing software.

Cheers, (from the sky above Helsinki)

Ross

integr-abile / axessibility

pdf created by axessibility doesn't pass preflight test #1