metanorma / metanorma-iso

Metanorma processor for ISO standards
BSD 2-Clause "Simplified" License
13 stars 5 forks source link

(URGENT) PDF accessibility: math not showing up in PDF content text #871

Open ronaldtse opened 1 year ago

ronaldtse commented 1 year ago

In PDF diffs it is important that the content does not introduce systematic changes from the original PDFs.

In the Metanorma-generated PDFs, the math content is missing from the content text. This is a sample from ISO 10303-50: (https://github.com/metanorma/iso-10303-detached-docs/tree/main/sources/iso-10303-50)

Screen Shot 2022-12-01 at 12 44 07 PM Screen Shot 2022-12-01 at 1 11 16 PM

In the generated PDFs, all the formula contents are "missing" from the content text. Given that we can insert AsciiMath for it, we would reduce a lot of these false positives.

Intelligent2013 commented 1 year ago

In the content tree there is text for x and y: image

Copy-pasted text also contains them: arguments y and x, which.

Currently, mn2pdf inserts hidden math as the transparency text (https://github.com/metanorma/metanorma-bipm/issues/188). Looks like, Acrobat doesn't see such text in the comparison feature.

I'll try to remove transparency mode temporarily locally and test the comparing result.

ronaldtse commented 1 year ago

Thank you for investigating! Indeed this is strange. I can verify in Preview that I can copy and paste this text.

Intelligent2013 commented 1 year ago

I've tested with different combination of color + transparency for hidden text:

I.e. if the text in white color or/and transparent, then Acrobat ignore it in the compare feature. But it's available for copy-paste feature. I don't figure out which is workaround solution can be applied...

ronaldtse commented 1 year ago

Then let's just keep it as it for now. I don't think we do comparisons too often...

ronaldtse commented 1 year ago

This is an interesting topic that @stuartgalt would be interested in as the PDF guru...

ronaldtse commented 1 year ago

Similar to #870 this is a problem with Adobe Acrobat's Compare PDF feature. Letting @stuartgalt know in case the PDF TC has (or plan to have) specs for Compare PDF.