w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
304 stars 60 forks source link

Restrict epub:type use on HTML header elements #1241

Closed mattgarrish closed 5 years ago

mattgarrish commented 5 years ago

Per the specification, the attribute has to be ignored if used in the header:

As the [HTML] head element contains metadata for the document, structural semantics expressed on this element or any descendant of it have no meaning. Reading Systems MUST ignore such semantics.

But it's not clear why the attribute is allowed there at all. Also per the specification:

The inflected semantic MUST express a subclass of the semantic of the carrying element.

But what possible structural subclasses are there for header elements, as they aren't even exposed?

The attribute was originally modelled on ARIA role which is also technically allowed anywhere but is functionally restricted from use in the header because header elements have no explicit roles.

Allowing the attribute in the header only has the potential to confuse its use with semantic enrichment techniques like RDFa and microdata (or with native attributes).

It would probably be best just to make clear there are no applications for the attribute in the header and disallow it.

rdeltour commented 5 years ago

@doktorchen raises a good point in w3c/epubcheck#986 that this would create another little backward incompatibility. I think I'd be more in favor of keeping this a note, and educate via best practices guidelines.

mattgarrish commented 5 years ago

I'd be interested to hear if anyone else has used it in that fashion, though, as it was an attempted workaround for the semantic enrichment attributes. It's not easily accomplished by the average author.

There also aren't any semantics in the structural vocabulary for use in the header, and I can't confess to having seen any extensions of the vocabulary beyond a few attempts to integrate the richer Z39.98 vocabulary.

Given the existing advice that you do it knowing it will result in nothing on the reading system end, adding a warning not to use seems more beneficial than harmful.

rdeltour commented 5 years ago

I'd be interested to hear if anyone else has used it in that fashion, though

If only we had access to usage data… 😁

Doktorchen commented 5 years ago

Because book stores and distributors rely on epubcheck messages, this can result. Due to the property problem of epubcheck there are books in these stores, which validate to EPUB3.0 with current epubcheck version.

I know of a view books already in stores using it, but my guess is, it is not often used.

Usually many staff members of those stores or distributors do not really know, what they are doing. This means finally endless discussions depending on the version of epubcheck they use. If they use in the future a version, that excepts RDFa property, but disallows OPS type, in theory one can switch, but they do not all use the same epubcheck version at the same time.

Without a new version number for EPUB such an incompatible changes is just a waste of time for authors and staff members of those shops or distributors.

The initial drafts for the HTML:role attribute was closely aligned to the RDFa property, but the meaning changed over time to represent only ARIA, therefore not of much use for authors, producing only accessible documents right from the start.

Ok, the XHTML:head element has not much substructure, but of course, it content can have semantic meaning, that is not sufficiently defined by HTML5, therefore additional attributes indicating the semantic meaning are relevant. This OPS type is a simple extension to do it. Why not?

For example Mozilla/Gecko or Pale Moon/Goanna expose meta data from the XHTML:head on demand. Extensions for them (Lucifox, EPUBReader 1.x) therefore expose this in the same way. If you know, how to use your program, you have access to this information.

mattgarrish commented 5 years ago

Ok, the XHTML:head element has not much substructure, but of course, it content can have semantic meaning

And that's exactly why there is a distinction in EPUB between semantic inflection - done by epub:type - and semantic enrichment - done by RDFa and microdata. The mechanisms aren't interchangeable.

Doktorchen commented 5 years ago

I still think, it fits. To indicate, that a meta element provides information about creation date, author or other Dublin Core terms for example provides a subclass of meta information about the document. The name attribute already indicates some kind of meaning, refined with Dublin Core terms. HTML5 defines only roughly a few values for the name attributes, others are possible. Therefore typically one needs a refinement.

http://www.idpf.org/epub/301/spec/epub-contentdocs.html#sec-xhtml-semantic-markup : 'Semantic inflection is the process of attaching additional meaning about the specific purpose and/or nature an element plays in an XHTML Content Document. In the context of EPUB Publications, the epub:type attribute is typically used to express domain-specific semantics, with the inflection(s) it carries complementing the underlying [HTML5] host vocabulary.' and 'Semantic metadata is not intended for human consumption; it instead provides a controlled way for Reading Systems and other User Agents to learn more about the structure and content of a document, providing them the opportunity to enhance the reading experience for Users. '

And of course, if EPUB 3.0 had allowed RDFa right from the start, epubcheck never reported errors about RDFa in books or even RDF containers both in SVG:metadata and in HTML5:head (what is ok according to HTML5 in the XML serialization: https://www.w3.org/TR/html5/dom.html#metadata-content-2 ), there would have never been a need to use OPS:type at all here or in other places in a document. But now the chance is gone, until there is a new version indication for EPUB.

And correction: Because metadata from other namespaces is allowed within HTML5:head in the XML serialization, of course, HTML5:head can have a lot of substructure.

mattgarrish commented 5 years ago

There is no data model associated with the attribute; you don't create triples from it. It doesn't make any statement about the value of the carrying element. You want things to be true that simply aren't.

Doktorchen commented 5 years ago

The meta element itself creates already the triple by design. It is about the current document (subject), the name attribute provides the predicate and the content attribute contains the object.

This applies as well for <meta name="viewport" content="width=600, height=12000"/> The document has a viewport, that is 'width=600, height=12000'. (I would have noted the content value in another way, but this is, what EPUB recommends ;o)

HTML5 does not define the meaning of the name value here. With RDFa one could add/refine the information, that this is some kind of apple viewport (i think, the W3C has at least drafts to do it in another way with more meaningful CSS units).

mattgarrish commented 5 years ago

It is about the current document (subject), the name attribute provides the predicate and the content attribute contains the object.

And the epub:type attribute does absolutely nothing, as it has no bearing on this model. It wasn't designed for this use in EPUB, and is entirely unrecognized for it, just as it is by HTML user agents.

It was not added as an alternative to either RDFa or microdata. The only reason those weren't also in the initial 3.0 release is because, at that time, it wasn't clear what their futures were. It wasn't until 3.0.1 that we could reasonably reference them with some surety that we weren't going to be left with a dead technology (or two).

As I've already mentioned, epub:type was designed to be a more flexible version of ARIA role, given that ARIA roles weren't extensible at the time. Given that it failed in that regard, as a model for AT processing also wasn't defined, hoping against hope that anything will process the attribute as metadata is going in the wrong direction. We don't need more confusion about the application of this attribute.

Even where it is allowed for use, if you were to try and coax a triple out of it, the semantic would be the object and the element that carries it the subject. A section is a chapter. An aside is a footnote. It is to more finely describe the structural roles elements play. Metadata describes aspects of the document; it doesn't define the structure.

dauwhe commented 5 years ago

Can we restrict epub:type to flow elements? We might have to add body...

mattgarrish commented 5 years ago

As we decided to leave this be for now, how about we change this paragraph in the introduction:

Semantic metadata is not intended for direct human consumption; it instead provides a controlled way for Reading Systems to learn more about the structure and content of a document, providing them the opportunity to enhance the reading experience for users.

to be a little more reflective of the reality of the attribute. So maybe:

Semantic metadata is intended primarily to assist in internal publishing workflows. While it also allows Reading Systems to learn more about the structure and content of a document, no specific behaviors are defined for the semantics and Reading Systems generally do not provide enhancements based on the presence of such semantics.

laudrain commented 5 years ago

@mattgarrish I feel a bit hesitant with your proposal. It is not completely true as some Reading System do provide interesting behavior based on epub:type. As foot-note popup. I know there is nothing in the spec for that but i would not like to kill it at first glance. More generally, the HTML role attribute wasn't at the beginning completely intended for a11y only, and now that it is, we heard at the last TPAC in Lyon about another new attribute called 'purpose' to serve exactly that kind of semantisation of HTML content that may be needed not only for internal publishing workflows.

iherman commented 5 years ago

…we heard at the last TPAC in Lyon about another new attribute called 'purpose' to serve exactly that kind of semantisation of HTML content that may be needed not only for internal publishing workflows.

Indeed. But it was some sort of a brainstorming only, and nobody carried the torch since then (alas!, I would say).

laudrain commented 5 years ago

@iherman where is the torch, I will carry it !

iherman commented 5 years ago

@laudrain 😁

The torch still has to be lit... One way, maybe, is to raise this issue on WCIG. It should be made clear that several communities need something like that (reaching out to the folks we were discussing this with at TPAC) not only a Web compatible version of EPUB/WPUB, that there should be some structure on how values to the attribute can/should be defined (some sort of a process is necessary to make this attribute fit the validator, for example), etc.

But this goes probably beyond EPUB3, ie, should be discussed elsewhere...

mattgarrish commented 5 years ago

I know there is nothing in the spec for that but i would not like to kill it at first glance.

What I wrote doesn't kill anything, only acknowledges that the attribute has little real value outside of publisher workflows.

We're almost ten years into EPUB 3 and all we have are a few undocumented implementations of pop-up footnotes. If it's not helping you internally, I think it's time to add some honesty that you're adding the semantics for no specific purpose.

laudrain commented 5 years ago

@mattgarrish not only does it serve better UX in some Reading Systems, but it also allows mapping to role attribute for accessibility. I'm far from regretting this semantic addition to our EPUB3 specifications end of 2015! Because it had the virtue to increase suppliers awareness of preserving content structure in EPUB3 at that time, and served well the accessibility move in helping them to add role attributes in 2018 to pass Ace! I must insist that this evolution has been progressive enough to be costless. Sorry, i know it's not the place to discuss that, but EPUB3 as a living standard...

mattgarrish commented 5 years ago

but it also allows mapping to role attribute for accessibility.

You're describing an internal workflow issue; it does nothing by itself.

There were two primary reasons why we added epub:type in 2010: 1) was to allow for publishing semantics so epub 3 could be used in internal workflows in place of xml grammars; and 2) the idea that it would allow for accessibility without publishers having to do anything more.

The latter has proved illusory, as none of the work was done to map the semantics to ARIA, or to ensure that they were applied logically, or to get every AT to support a brand new attribute. That only leaves the publisher workflow purpose.

But what about the modified paragraph suggests there isn't any value for internal workflows, or to having the attribute? My point is that that is exactly where it is the most useful, not to the average author.

We also know that reading systems are not stampeding to add behaviours based on the attribute, so saying that they

generally do not provide enhancements based on the presence of such semantics.

is just a reflection of reality.

laudrain commented 5 years ago

My feeling is that your second paragraph is too restrictive. I definitively prefer the former version which more factual ('learn more about the structure and content of a document') and reflecting more reality ('opportunity to enhance the reading experience for users').

Doktorchen commented 5 years ago

The main applications I can see: a) bibliographic research in the future, semantic text analysis in science (would have been interesting for SVG 1.1 content documents as well, because it has not much semantics concerning text, but good options for accessibility).

if used inside the body element: b) CSS styling due to semantics (for example HTML5 has no elements for poetry/lyrics, nothing for specific book-content) - obviously it matters even for visual presentation to expose structure in relation to content (section, article, aside have the same default styling as div). For this, values of the class element have no defined meaning, therefore no option to expose structure related to semantics. In EPUB 3 books all my CSS is aligned to semantics (structure), HTML5 has already more elements for this, but not much for specific genres and not for books at all.

At the beginning of this attribute it might have helped to provide a default stylesheet for books including styling for these attribute values to get a better adoption, because if it becomes visible for the average author, some really use it and developers of user-agents only have to copy the default stylesheet to get it implemented ;o)

dauwhe commented 5 years ago

We're not going to add restrictions to epub:type. But as Matt mentioned, we have one single instance where it changes reading system behavior in the entire history of EPUB3. I see so many people putting so much thought and effort into epub:type, when that effort would be better spent on ensuring that HTML elements are being used correctly.

I fully support adding Matt's proposed language to the spec.

laudrain commented 5 years ago

@dauwhe thanks. Nonetheless, to structure content upstream, publishers like Hachette Livre did not wait for epub:type! Instead, using XML publishing workflows for since 2001, there is no effort to epub:type from upstream structured content. Then also publishers never did expect HTML elements to manage that structure.

But why EPUBs should necessary loose all that semantic? It's not all about footnotes. As @doktorchen explains above, there are other use case, and I could add page breaks, media-overlay, etc...

mattgarrish commented 5 years ago

But why EPUBs should necessary loose all that semantic?

I don't follow this assertion. Nobody has said we're getting rid of the attribute.

The revised wording doesn't exclude a reading system from using the attribute or from providing behaviours. All it does is point out: a) that any behaviours are not standardized; and b) most semantics don't actually do anything.

Leaving aside the specialized uses, which is not what the epub:type definition in content documents is about, all we're really doing is warning people not to expect any magic.

The reality is that unless you have some purpose in mind that you can implement, layering your document with semantics is not adding much value in reading systems.

That's not the same thing as saying the attribute has no value to anyone.

But what if we tweak the first sentence slightly to make this clearer and then maybe we don't have to get into the poor support issue:

Semantic metadata is intended primarily to enrich content for use in publishing workflows (data modelling, semantic styling, etc.) and for author-defined purposes (content scripting, etc.). While it also allows Reading Systems to learn more about the structure and content of a document, no specific behaviors are defined for the semantics by this specification. Any such behaviors are Reading System-dependent.

laudrain commented 5 years ago

@mattgarrish thanks for the tweaking. Probably no need to be specific. What about:

Semantic metadata is intended to enrich content for use in publishing workflows and for author-defined purposes. While it also allows Reading Systems to learn more about the structure and content of a document, no specific behaviors are defined for the semantics by this specification. Any such behaviors are Reading System-dependent.

dauwhe commented 5 years ago

Closed via #1252