plone / volto

React-based frontend for the Plone Content Management System
https://demo.plone.org/
MIT License
453 stars 612 forks source link

A11Y: Check uploaded PDFs for accessibility #1506

Open tisto opened 4 years ago

tisto commented 4 years ago

This is rather a backend issue. Though Volto would have to show possible warnings.

Resources:

mgifford commented 3 years ago

Not sure why Joe Clark's article was pulled form A List Apart, other than that it is old https://web.archive.org/web/20190702113639/https://alistapart.com/article/pdf_accessibility/

Also worth checking out: https://github.com/OpenConceptConsulting/perception which is leveraging PyPDF2

This is a Node tool to leverage the eiii work above https://github.com/zrrrzzt/node-wcag-pdf

This is an old effort https://github.com/keoliva/Accessibility-Checker

I think this was originally from NASA - https://github.com/OmkarKirpan/T-ENTacle - based on https://code.nasa.gov/

Here's another https://blogs.swarthmore.edu/its/2017/07/18/pdf-accessibility-checker-announced/ - https://github.com/Swarthmore/filescan-server - which seems to be updated more regularly

This seems to be a Moodle Plugin https://github.com/dslab-epfl/accessibility-checker

tisto commented 3 years ago

I checked out the pdfwam library. The library is based on Python 2, which is a problem in the long run since Python 2 reached its EOL in April 2020.

The library uses pypdf to read the pdf and extract basic information. The core of the pdfwam library is in this file:

https://gitlab.tingtun.no/eiii_source/pdfwam/-/blob/master/pdfwcag.py

where it runs those WCAG 2.0 PDF tests:

One option would be to take this library, migrate it to Python 3 and publish it properly on PyPi. Though re-writing the library from the scratch might also be an option since the code is outdated and the overall quality of the code is not great.