Structural semantics vocabularies for education

dauwhe commented 7 years ago

The EPUB for Education spec refers to the EPUB for Education Structural Semantics document. How does this relate to the DPUB-ARIA roles?

And how might we best present this kind of information to the poor user who just wants to know which value to use?

TzviyaSiegman commented 7 years ago

We discussed future extensions of ARIA with the ARIA WG or DPUB-ARIA 2.0. We should be able to create additional roles as long as they do not require mappings to accessibility APIs. This would mean that the roles would validate and the native semantics of the HTML elements would be preserved. It is worth checking with ARIA WG to confirm.

WSchindler commented 7 years ago

This possibility of future extensions of ARIA would also be crucial for "porting" other important IDPF specs such as EPUB Indexes and EPUB DICT (from epub:type to ARIA role) so that they could be used in WP/PWP/EPUB 4. Of course, this would also be an issue for future versions of epubcheck!

mattgarrish commented 7 years ago

To this day I wonder why indexes and dictionaries weren't done as vocabularies for use with RDFa/microdata/json. I don't see them fitting well into the role model with their ancestor/descendant requirements and implied usage. The role attribute is not intended for complex data modelling, just as epub:type was veered away from its original use as a tool of simple semantic inflection.

mattgarrish commented 7 years ago

I'd be interested to hear @iherman 's take on this, though.

iherman commented 7 years ago

@WSchindler @TzviyaSiegman the Publishing Working Group's charter include DPUB-ARIA 2.0 as part of its deliverables. It does not list which terms would be added, that decision will be up to the WG, but the EDUPUB vocabulary (or part thereof) is obviously a good candidate. Same for the indexes and EPUB DICT. Let us hope we can start the real work in the WG soon:-)

iherman commented 7 years ago

@mattgarrish, per RDF: I think there are/were two separate issues.

By going with ARIA, we get accessibility mapping for free, so to say. I would expect that the new vocabulary may not require any further AAM spec, only a vocabulary; but by using role & Co the connection to A11y is there. I believe that was a very strong motivation back two years ago do go down that line.
RDF has lots of baggage; we cannot just define something that looks like RDF, but is not really RDF. Ie, we must be precise on what we mean in an RDF sense.

RDF has subjects, objects, and predicates. If used for a structural vocabulary, the predicate and the object are clear. But what is the subject? To be more precise, what is the URI of the subject? (Remember that an RDF subject must be, to be precise, an IRI.) If this is not clearly specified in the document then, when using, say, RDFa, it is the default subject, i.e., the document itself. That is not, semantically, what we want, though; i.e., the subject must be explicitly specified. In microdata you have to place an extra @itemscope somewhere, in RDFa you do the same using, e.g., @about. But even that is not enough because, semantically, you actually want to use these attributes to denote the textual content of the element (e.g., when you specify that this is the "abstract"). What is the URI of that thing? Some sort of a complicated XPATH expression? Something based on the Annotation selectors?

There is a middle solution, actually: there is a specification to generate RDF based on the @role attribute, although not widely known and certainly not widely implemented (my RDFa Distiller does it, though, and I believe Gregg Kellogg’s similar tool does it as well). The conversion is based on using the @id attribute to set the subject for the triples or, in its absence, using a blank node. RDF purists may get into a long discussion whether this is good or not (does an @id really reflect the semantics that I referred to above?), but it may be good enough. At least with a @id; I believe using a blank node for the purposes of a DPUB ARIA term is meaningless.

Whatever we choose, however, even if we rely on the mapping defined in the @role module, this means that we require each element that carries a @role value to include an @id (or an @about). I do not believe it is realistic to expect that; 90% of our users will forget this…

(B.t.w., this was the topic of a long discussion for the ITS specification. That document ended up defining an informal mapping to RDF, and the result shows the complexity. Whether it is worth to have that for DPUB-ARIA is a major question.)

(Sorry for the long text; you asked for it:-)

mattgarrish commented 7 years ago

But indexes and dictionaries define complex relationships. A book has an index. The index has one or more entries, each of which is composed of individual units of information.

As I understand these two specifications, they exist with an expectation that the reading system (or even an operating system) can extract and compile the information from the source (for querying, etc.). That's not a function of the role element, but sounds to me more like RDF graphs.

But both specifications would need a rethink whichever approach is chosen, as they depend on additional external metadata in the package to unite the content when it is broken across multiple documents. That doesn't really work with either model.

The implied semantics would likely also have to be surfaced and made explicit. They make me wonder what pushback we'd get on validation (w3c/validator.nu) using roles if we persist such a model.

w3c / publ-cg

Structural semantics vocabularies for education #4