Implementing accessibility metadata

llemeurfr commented 5 years ago

During the 24/04/2019 call, the discussion led to:

We will use the EPUB OPF a11y metadata (or W3C Web Publications metadata) as a source
The RWPM will express each a11y metadata as arrays of strings
These metadata will be handled via the Readium accessibility mechanism
Each codebase will define helpers to “extract” a11y metadata from the object
These helpers will follow the Benetech UI recommendations (link below), i.e we will have: -- ScreenReaderFriendly() returning yes / no / unknown -- Audiobook() returning yes / no -- etc.

https://w3c.github.io/publ-a11y/UX-Guide-Metadata/principles/

Can we agree this is the way to go?

HadrienGardeur commented 5 years ago

I think that accessibilitySummary should either be a string or a localized string rather than an array of strings.

accessModeSufficient needs to be expressed as an array or array of strings (🙄).

JayPanoz commented 5 years ago

accessModeSufficient → this one is even mega super confusing as an author.

Had to use it a few weeks ago, in my very last e-production gig and I was like “WTF‽”

Quite frankly, I hope that they redesign it at some point. Usage makes it even more difficult to understand what the definition is in the first place. 😫

danielweck commented 4 years ago

Review of @JayPanoz 's current draft: https://github.com/JayPanoz/architecture/blob/a11y-metadata-parsing/streamer/parser/a11y-metadata-parsing.md

Correction: "The array is created from the meta elements whose property attribute has the value ..." => not just EPUB3 meta + property, but also EPUB2 meta + name
Missing: a11y:certifierCredential is a meta + name in EPUB2, but in EPUB3 can be meta + property, or alternatively link + property (in which case the value is expected to be a URL)
Missing: a11y:certifierReport is a meta + name in EPUB2, but in EPUB3 it cannot be meta + property, it must be a link + property (the value must be a URL)
Correction: dcterms:conformsTo not link + rel, but link + property (in EPUB3), or meta + name in EPUB2.
Correction: the enumerated values of schema:accessibilityFeature is actually open-ended, due to the possible displayTransformability suffixes which map to CSS rules (typically: /font-size, /font-family, /line-height, /word-spacing, /letter-spacing, /color, /background-color, etc.). Also, note the missing highContrastAudio suffixes (/noBackground, /reducedBackground and /switchableBackground)
Clarification: although dcterms:conformsTo is strictly-speaking an open-ended choice of arbitrary URLs, it is likely one of: http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-a, http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-aa, http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-aaa
Clarification: schema:accessMode, schema:accessibilityFeature, schema:accessibilityHazard and schema:accessibilitySummary are "required" properties (in terms of validation against the a11y conformance rules)
Cardinality: there is some ambiguity about which accessibility metadata can be repeated. For example, it does not make sense for schema:accessibilitySummary to repeat, yet the specification isn't clear about that, so there can potentially be several properties with the name/property in the EPUB package *.opf XML (a bit like dc:title). I think the R2 model should store them all, and it is the responsibility of the processor / consumer to figure out what to do with it (e.g. reading system can display the first one only, or a concatenation). The clearly repeatable properties are: schema:accessMode, schema:accessibilityFeature, schema:accessibilityHazard, schema:accessModeSufficient (the only one which allows comma-separated values from the enumerated list of tokens), schema:accessibilityAPI (although currently likely just ARIA), and schema:accessibilityControl. I guess it makes sense for these to be repeatable as well: dcterms:conformsTo, a11y:certifiedBy, and a11y:certifierCredential, but it would seem that a11y:certifierReport should be unique ... but then again, the R2 models should be ready for the possibility of several occurrences, I think.
To be debated: schema:accessModeSufficient can be repeated, and each occurrence is itself a comma-separated list of tokens from the enumeration. The current draft proposes to store these individual values as an array of tokens, rather than as the original linearized string. I am not so sure about this approach (I speak based on my own experience having implemented an editor for accessibility metadata), I think we should just naively preserve the original string value, with all its potential "weirdness" (e.g. insignificant whitespaces - or lack thereof - between tokens and comma separators, token ordering, duplicates, etc.)

References:

danielweck commented 4 years ago

Note that r2-shared-js implements the above (nothing fancy, just boring repetitive parsing code), with careful handling of EPUB 2 name + content versus EPUB 3 property metadata, and of course special handling of metadata link + property for dcterms:conformsTo, a11y:certifierReport and optionally a11y:certifierCredential.

Code references:

https://github.com/readium/r2-shared-js/blob/77348ed92bdfdbf0e28573379d094a17297afc50/src/models/metadata.ts#L66-L217

https://github.com/readium/r2-shared-js/blob/77348ed92bdfdbf0e28573379d094a17297afc50/src/parser/epub.ts#L501-L710

danielweck commented 4 years ago

Side note: I do not know what the W3C webpub accessibility-report is, in relation to the specs linked above.

https://www.w3.org/TR/pub-manifest/#accessibility-report

danielweck commented 4 years ago

* **To be debated**: `schema:accessModeSufficient` can be repeated, and each occurrence is itself a comma-separated list of tokens from the enumeration. The current draft proposes to store these individual values as an array of tokens, rather than as the original linearized string. I am not so sure about this approach (I speak based on my own experience having implemented an editor for accessibility metadata), I think we should just naively preserve the original string value, with all its potential "weirdness"  (e.g. insignificant whitespaces - or lack thereof - between tokens and comma separators, etc.)

Note that the W3C draft spec. breaks down individual tokens in the linearized comma-separated enumeration for the accessModeSufficient property:

https://www.w3.org/TR/pub-manifest/#accessibility https://www.w3.org/TR/pub-manifest/#webidl-wpm

https://www.w3.org/TR/pub-manifest/#example-19-setting-accessiblity-metadata-for-a-publication-that-provides-alternative-text-and-long-descriptions-appropriate-for-each-image-enabling-it-to-be-read-in-purely-textual-form:

{
    …
    "accessMode"              : ["textual", "visual"],
    "accessibilityFeature"    : ["alternativeText", "longDescription"]
    "accessModeSufficient"    : [
        {
            "type"            : "ItemList",
            "itemListElement" : ["textual", "visual"]
        },
        {
            "type"            : "ItemList",
            "itemListElement" : ["textual"]
        }
    ],
    …
}

danielweck commented 4 years ago

The current draft proposes to store these individual values (schema:accessModeSufficient) as an array of tokens, rather than as the original linearized string. I am not so sure about this approach ...

So, in r2-shared-js I added a convenient utility helper function to decompose and normalize the original/authored AccessModeSufficient string (i.e. raw linearized comma-separated value, when parsed from EPUB) into a canonical "array-of-(array-of-(string))" form, with removed insignificant whitespace, eliminated duplicates, and preserved order (the duplicates are removed on the trailing edge of the matching iteration).

Unfortunately due to a limitation in the declarative JSON (de)serialization library used for the R2 models, I was not able to directly implement array-of-array (array-of-object works fine, we use it a lot, but because of how prototypal class inheritance works in Javascript, array-of-array seems a no-go) ... thus the convenient, but separate helper.

Thorium / readium-desktop will invoke this utility function as needed, in order to present the accessibility metadata as per the standard UX guidelines: https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

PS Javascript code:

const AccessModeSufficientParsed = AccessModeSufficient.map((ams) =>
                ams.split(",").
                map((token) => token.trim()).
                filter((token) => token.length).
                reduce((pv, cv) => pv.includes(cv) ? pv : pv.concat(cv).
                filter((arr) => arr.length), []);

Example input/output: ["", " visual , textual ,, visual ", "auditory, auditory,,"] => [["visual","textual"],["auditory"]]

HadrienGardeur commented 4 years ago

Aside from purely parsing and representing these metadata, I think that the real question remains: what can we actually use them for?

IMO the community around EPUB, has failed so far to build compelling use cases of how these various properties can be leveraged.

I'd rather have less metadata and know what to actually make of them.

danielweck commented 4 years ago

the real question remains: what can we actually use them for?

https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

JayPanoz commented 4 years ago

@danielweck thanks for the review.

I must admit that I wasn’t particularly confident/comfortable with this draft, as accessibility metadata in EPUB isn’t necessarily my forte – and well that was an external contribution in Blitz whose default was modified later as having everything by default instead of a reasonable subset might have well produced unreliable a11y metadata – so I’m indeed expecting quite a lot of massive changes to this draft.

HadrienGardeur commented 4 years ago

the real question remains: what can we actually use them for?

https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

Sure that's better than nothing, but beyond displaying these metadata, how can we truly leverage them?

llemeurfr commented 4 years ago

Sure that's better than nothing,

translate: this is already great :-)

beyond displaying these metadata, how can we truly leverage them?

Use them (I mean the mapped information, e.g. "Screen reader friendly") as filters in reading app bookshelves is the next step.

danielweck commented 4 years ago

Unfortunately due to a limitation in the declarative JSON (de)serialization library used for the R2 models, I was not able to directly implement array-of-array (array-of-object works fine, we use it a lot, but because of how prototypal class inheritance works in Javascript, array-of-array seems a no-go) ... thus the convenient, but separate helper.

This is now fixed properly, so that the JSON syntax is optimal without the need of a helper function.

danielweck commented 4 years ago

Another point of interest, cross-walk project (EPUB, Schema.org and ONIX): http://www.a11ymetadata.org/the-specification/metadata-crosswalk/ https://docs.google.com/spreadsheets/d/e/2PACX-1vTBWK6YwcDNYQTjE5dodNsMaIqRDUWu9SLsNwiaAZIrGn3BKa7iVlnTM6Nw5aU_qFKMUBcThEXlQAds/pubhtml

Summary of various useful references thus far:

http://kb.daisy.org/publishing/docs/metadata/schema-org.html http://kb.daisy.org/publishing/docs/metadata/evaluation.html https://www.w3.org/wiki/WebSchemas/Accessibility https://www.w3.org/TR/pub-manifest/#accessibility https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

PS: I am not sure about the accessibility-report link, which seems close to a11y:certifierReport? https://www.w3.org/TR/pub-manifest/#accessibility-report

readium / architecture

Implementing accessibility metadata #94