w3c / publ-a11y

Accessibility related discussions of the Publishing@W3C Groups
Other
25 stars 6 forks source link

Inferring accessibility metadata when none is present #191

Open GeorgeKerscher opened 1 year ago

GeorgeKerscher commented 1 year ago

On the October 26,2023 call, the question came up about what to do when no or little accessibility metadata is present. Should this group be providing guidance about inferring metadata when it is not present. In the past we have said that a reflowable EPUB with a detailed nave doc is normally very accessible. It is also possible for the EPUB to be examined for accessibility features. So the issue is what guidance should we be providing about a distributor, for example, adding accessibility metadata to their catalogue that can be inferred by examination of the title?

chrisONIX commented 1 year ago

Hadrien Gardeur of DeMarque did a presentation about the lack of accessibility metadata at the EDItEUR Supply Chain Conference at the Frankfurt Book Fair and the slides are available for all on the EDItEUR website here: https://editeur.org/3/Events/Event-Details/667

gautierchomel commented 1 year ago

Readium go toolkits Inferred metadata (work in progress) explores the path. We'll be happy to discuss the subject collectively.

gregoriopellegrino commented 1 year ago

I think this is an important issue. I see different organizations moving toward that, with the risk of different interpretations of how to do metadata infer. I think joint work will be needed to define high-level guidelines on how to analyze code to extract metadata in a consistent way across different implementations.

In terms of UX guidelines the aspect we will have to consider is whether to indicate to the end user if a piece of metadata comes from the content creator or from an inferring algorithm. To be considered is if we should add this information and with what level of granularity.

rickj commented 1 year ago

Thank you for the link to the slides @gautierchomel . Interesting to see what Readium has seen.

At look at our titles:

HadrienGardeur commented 1 year ago

As mentioned by @chrisONIX, @gautierchomel and @gregoriopellegrino we've been working on various things over the last 18 months at De Marque:

The data covered in my presentation in Frankfurt comes primarily from trade publishing, which is probably quite a different dataset from what @rickj has on his side.

The logic for our inference rules is entirely open source, but I can summarize it here:

Here's the list of current rules:

For the table of contents and page list, this could be refined by:

That said, we've seen EPUB that had good reasons for only having a smallish table of contents or a partial page list, so it's pretty hard to define a rule that works across all publications.