readium / architecture

📚 Documents the architecture of the Readium projects
https://readium.org/architecture/
BSD 3-Clause "New" or "Revised" License
176 stars 33 forks source link

Implementing accessibility metadata #94

Open llemeurfr opened 5 years ago

llemeurfr commented 5 years ago

During the 24/04/2019 call, the discussion led to:

https://w3c.github.io/publ-a11y/UX-Guide-Metadata/principles/

Can we agree this is the way to go?

HadrienGardeur commented 5 years ago

I think that accessibilitySummary should either be a string or a localized string rather than an array of strings.

accessModeSufficient needs to be expressed as an array or array of strings (🙄).

JayPanoz commented 5 years ago

accessModeSufficient → this one is even mega super confusing as an author.

Had to use it a few weeks ago, in my very last e-production gig and I was like “WTF‽”

Quite frankly, I hope that they redesign it at some point. Usage makes it even more difficult to understand what the definition is in the first place. 😫

danielweck commented 4 years ago

Review of @JayPanoz 's current draft: https://github.com/JayPanoz/architecture/blob/a11y-metadata-parsing/streamer/parser/a11y-metadata-parsing.md

References:

danielweck commented 4 years ago

Note that r2-shared-js implements the above (nothing fancy, just boring repetitive parsing code), with careful handling of EPUB 2 name + content versus EPUB 3 property metadata, and of course special handling of metadata link + property for dcterms:conformsTo, a11y:certifierReport and optionally a11y:certifierCredential.

Code references:

https://github.com/readium/r2-shared-js/blob/77348ed92bdfdbf0e28573379d094a17297afc50/src/models/metadata.ts#L66-L217

https://github.com/readium/r2-shared-js/blob/77348ed92bdfdbf0e28573379d094a17297afc50/src/parser/epub.ts#L501-L710

danielweck commented 4 years ago

Side note: I do not know what the W3C webpub accessibility-report is, in relation to the specs linked above.

https://www.w3.org/TR/pub-manifest/#accessibility-report

danielweck commented 4 years ago
* **To be debated**: `schema:accessModeSufficient` can be repeated, and each occurrence is itself a comma-separated list of tokens from the enumeration. The current draft proposes to store these individual values as an array of tokens, rather than as the original linearized string. I am not so sure about this approach (I speak based on my own experience having implemented an editor for accessibility metadata), I think we should just naively preserve the original string value, with all its potential "weirdness"  (e.g. insignificant whitespaces - or lack thereof - between tokens and comma separators, etc.)

Note that the W3C draft spec. breaks down individual tokens in the linearized comma-separated enumeration for the accessModeSufficient property:

https://www.w3.org/TR/pub-manifest/#accessibility https://www.w3.org/TR/pub-manifest/#webidl-wpm

https://www.w3.org/TR/pub-manifest/#example-19-setting-accessiblity-metadata-for-a-publication-that-provides-alternative-text-and-long-descriptions-appropriate-for-each-image-enabling-it-to-be-read-in-purely-textual-form:

{
    …
    "accessMode"              : ["textual", "visual"],
    "accessibilityFeature"    : ["alternativeText", "longDescription"]
    "accessModeSufficient"    : [
        {
            "type"            : "ItemList",
            "itemListElement" : ["textual", "visual"]
        },
        {
            "type"            : "ItemList",
            "itemListElement" : ["textual"]
        }
    ],
    …
}
danielweck commented 4 years ago

The current draft proposes to store these individual values (schema:accessModeSufficient) as an array of tokens, rather than as the original linearized string. I am not so sure about this approach ...

So, in r2-shared-js I added a convenient utility helper function to decompose and normalize the original/authored AccessModeSufficient string (i.e. raw linearized comma-separated value, when parsed from EPUB) into a canonical "array-of-(array-of-(string))" form, with removed insignificant whitespace, eliminated duplicates, and preserved order (the duplicates are removed on the trailing edge of the matching iteration).

Unfortunately due to a limitation in the declarative JSON (de)serialization library used for the R2 models, I was not able to directly implement array-of-array (array-of-object works fine, we use it a lot, but because of how prototypal class inheritance works in Javascript, array-of-array seems a no-go) ... thus the convenient, but separate helper.

Thorium / readium-desktop will invoke this utility function as needed, in order to present the accessibility metadata as per the standard UX guidelines: https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

PS Javascript code:

const AccessModeSufficientParsed = AccessModeSufficient.map((ams) =>
                ams.split(",").
                map((token) => token.trim()).
                filter((token) => token.length).
                reduce((pv, cv) => pv.includes(cv) ? pv : pv.concat(cv).
                filter((arr) => arr.length), []);

Example input/output: ["", " visual , textual ,, visual ", "auditory, auditory,,"] => [["visual","textual"],["auditory"]]

HadrienGardeur commented 4 years ago

Aside from purely parsing and representing these metadata, I think that the real question remains: what can we actually use them for?

IMO the community around EPUB, has failed so far to build compelling use cases of how these various properties can be leveraged.

I'd rather have less metadata and know what to actually make of them.

danielweck commented 4 years ago

the real question remains: what can we actually use them for?

https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

JayPanoz commented 4 years ago

@danielweck thanks for the review.

I must admit that I wasn’t particularly confident/comfortable with this draft, as accessibility metadata in EPUB isn’t necessarily my forte – and well that was an external contribution in Blitz whose default was modified later as having everything by default instead of a reasonable subset might have well produced unreliable a11y metadata – so I’m indeed expecting quite a lot of massive changes to this draft.

HadrienGardeur commented 4 years ago

the real question remains: what can we actually use them for?

https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

Sure that's better than nothing, but beyond displaying these metadata, how can we truly leverage them?

llemeurfr commented 4 years ago

Sure that's better than nothing,

translate: this is already great :-)

beyond displaying these metadata, how can we truly leverage them?

Use them (I mean the mapped information, e.g. "Screen reader friendly") as filters in reading app bookshelves is the next step.

danielweck commented 4 years ago

Unfortunately due to a limitation in the declarative JSON (de)serialization library used for the R2 models, I was not able to directly implement array-of-array (array-of-object works fine, we use it a lot, but because of how prototypal class inheritance works in Javascript, array-of-array seems a no-go) ... thus the convenient, but separate helper.

This is now fixed properly, so that the JSON syntax is optimal without the need of a helper function.

danielweck commented 4 years ago

Another point of interest, cross-walk project (EPUB, Schema.org and ONIX): http://www.a11ymetadata.org/the-specification/metadata-crosswalk/ https://docs.google.com/spreadsheets/d/e/2PACX-1vTBWK6YwcDNYQTjE5dodNsMaIqRDUWu9SLsNwiaAZIrGn3BKa7iVlnTM6Nw5aU_qFKMUBcThEXlQAds/pubhtml

Summary of various useful references thus far:

http://kb.daisy.org/publishing/docs/metadata/schema-org.html http://kb.daisy.org/publishing/docs/metadata/evaluation.html https://www.w3.org/wiki/WebSchemas/Accessibility https://www.w3.org/TR/pub-manifest/#accessibility https://w3c.github.io/publ-a11y/UX-Guide-Metadata/techniques/schema-org.html

PS: I am not sure about the accessibility-report link, which seems close to a11y:certifierReport? https://www.w3.org/TR/pub-manifest/#accessibility-report