psu-libraries / scholarsphere

Penn State's next generation institutional repository
MIT License
12 stars 6 forks source link

Automated Accessibility Checking #1602

Open ajkiessl opened 21 hours ago

ajkiessl commented 21 hours ago

From the epic (#1580):

If feasible, the first step in this process would be an automated check of each document that is uploaded to ScholarSphere during the workflow for depositing a work. This check would use a third-party tool to analyze and evaluate each document in terms of accepted accessibility standards. We've begun to describe this step in https://github.com/psu-libraries/scholarsphere/issues/1580. We would want this automated check to be triggered by each document upload and to run asynchronously in the background. The result of the check should be a pass/fail evaluation of each document along with a list of any detected accessibility issues (and suggested remediations) for each document.

We have decided to use Adobe's PDF Services API to do our accessibility checking. Refer to https://github.com/psu-libraries/scholarsphere/issues/1580#issuecomment-2359209601 for some links to documentation. We will need to build our own wrapper that returns a pass/fail analysis and parses out any issues and remediations given to us by the API. Accessibility checking should kick off in an async Sidekiq job for each uploaded file after the file upload step in the submission workflow.

There may be a way to pass signed URLs from ScholarSphere's AWS S3 bucket to Adobe for checking, so let's investigate that before considering other options. Here's the documentation for passing signed URLs: https://developer.adobe.com/document-services/docs/overview/pdf-services-api/howtos/pdf-external-storage-sol/ .

ajkiessl commented 20 hours ago

Part of #1580