Open NSoiffer opened 5 years ago
cc: @michaelDCurran and @seanbudd in case there are plans to improve math reading with NVDA, yet another use case.
FYI: an update...
There is a project that Adobe has funded for the last few years to get pdftex to produce tagged PDF, including MathML in an associated file. This is being done by rewriting the core of the main TeX implementation to pass structural information through to the stage where the PDF gets generated. The latest I saw was that math part is to be worked on in March, 2024. It among the last things that they are doing. That will produce a lot of PDFs with MathML in them. Their goal is to be nearly 100% backwards compatible, so old PDFs just need to be regenerated from unmodified Tex/LaTeX to get well tagged PDF.
AFAIK, Adobe has yet to update their API. However, I have been talking with Foxit and they are working on providing access to the associated file in their PDF viewer. In fact, if they implement my suggestion of changing the ROLE from ROLE_SYSTEM_TEXT to be ROLE_SYSTEM_EQUATION in a alpha version that handles associated files that they showed me, I think it will be 5-10 lines of code in _getNodeMathMl
in adobeAcrobat.py to get NVDA to read the math.
If that works out, I hope Adobe follows their lead and does it the same way. No interface change and minimal NVDA changes and math in PDF becomes accessible. Fingers crossed...
Another update: it turns out it was four lines of code to make this work. It would have been three, but there is a bug in what they did so I need to do a bit of surgery on the MathML they generated; for readability, I split it onto another line.
If foxit decides this is an approach they like, I'll try to find some people at Adobe and see if they will be willing to expose the associated file in the same way. If so, then I'll do a PR.
In case it wasn't clear: this PR would work for MathPlayer, MathCAT, Access8Math, and any other math provider.
There are several PDF files demonstrating Associated MathML file tagging at the LaTeX Project page
Feature Request
Many years ago, NVDA added support for reading math in PDF documents. Unfortunately, the mechanism that PDF described for adding MathML to a PDF is difficult for software to generate, so other than test documents and sample hand tagging, there are not many PDFs around that tag the math in this manner.
In PDF v2 (ISO 32000-2), a much simpler method was added to tag math: associated files. This loses a little functionality (synchronized highlighting becomes much harder for AT that wants to do that), but it makes it much easier to add MathML. This request is for NVDA to add to its existing MathML functionality the ability to get the math from the associated file.
Because NVDA already has code to pass MathML to an application that can braille it/produce speech for it (e.g, MathPlayer), the work required here is to additionally look in the associated file for MathML. Sadly, Adobe has not updated their accessibility interface to v2, so getting that info requires diving into (I think) the PDSEdit layer. Doing so is not rocket science, but it is obviously more work than a few more PDomNode calls.
PDF Details
Spec
Section 14.13 of the ISO 32000-2 spec discusses associated files. Here are some relevant quotes from the spec:
Other potential places in the spec for info:
Acrobat API
The overview of the Acrobat API is found here. I believe the relevant interface to access is PDSElement. This provides access to the structure tree. Potentially the COS layer is involved to access the dictionary structure.
Since I was looking, it might save someone a minute to know that the MathML code for acrobat is in
NVDAObjects/IAccessible/adobeAcrobat.py
.