wojtekmaj / react-pdf

Display PDFs in your React app as easily as if they were images.
https://projects.wojtekmaj.pl/react-pdf
MIT License
9.44k stars 886 forks source link

Match accessibility features offered by pdfjs viewer #1494

Closed MattL75 closed 1 year ago

MattL75 commented 1 year ago

Before you start - checklist

Description

This feature was previously sortof reported here: https://github.com/wojtekmaj/react-pdf/issues/831 but was closed without a proper conclusion.

The idea is to provide an accessibility layer that matches what is offered in the pdfjs viewer. Here's an example. Screen Shot 2023-05-08 at 3 49 02 PM

In a nutshell, it makes use of the pdf tagging feature and displays informational semantic data that screenreaders can tap into.

<canvas>
  <span role="heading" aria-level="1" aria_owns="heading_id"></span>
  <span aria_owns="some_paragraph"></span>
</canvas>

In the text layer:
<span id="heading_id">Some Heading</span>
<span id="some_paragaph">Hello world!</span>

Proposed solution

The solution isn't actually super difficult to implement. I have a working PoC locally with our fork (react-pdf v6) and it doesn't have a huge amount of changes. I will be implementing this properly in a sprint soon and I'd like to contribute it upstream to react-pdf.

The key is the following function with property: getTextContent({includeMarkedContent: true})

We can then use this automatically by passing it to renderTextLayer function from pdfjs.

In my tests on an older pdfjs version, it was required to use the STREAM of the text content in the renderTextLayer function to obtain a properly-formatted DOM.

We can then have a component which recursively renders the struct tree associated with the marked content.

Alternatives

No response

Additional information

As mentioned above, I have most of this figured out locally in a rough PoC.

@wojtekmaj Is this something you feel is within the scope of react-pdf? Or rather something I should keep within my fork. No problems either way :)

wojtekmaj commented 1 year ago

HELL TO THE YEAH, accessibility isn't a feature, it's a foundation. If there's any feature implemented in React-PDF that provides subpar experience for folks using assistive technologies, I'd consider it a bug.

Please go ahead!

Show me what you got

rihards-simanovics commented 1 year ago

Hey Guys, actually working to implement this package on a semi governmental website, I really like that you provide a semi accessible text layer (presumably the idea was to make text selectable for easy of copy and paste). I can see this also be useful for the narrator, however making all of the text to be span isn't helpful. Is there any react hooks or any other way to make this customisable, say content hierarchy h1, h2, h3, ..., p, b ,etc?

Edit: @MattL75 just read the rest of the issue, if you can share here when you make a pull request this would be awesome, I can test it on my side as well! Oh and thanks for great work!

Edit 2: guess you already did #1498

MattL75 commented 1 year ago

@rihards-simanovics Yes as you saw I have made the PR. If you can test it out that would be great. Keep in mind the documents still need to be tagged (using acrobat) for the semantic tags to show up. Without a tagged document, no way to show accessible :)