w3c / publ-cg

EPUB 3 Community Group Repository
Other
44 stars 16 forks source link

need for publisher-specific or publication-specific semantics modifying elements #58

Open RachelComerford opened 6 years ago

RachelComerford commented 6 years ago

from BISG survey

dauwhe commented 6 years ago

How would such semantics be used/exposed by reading systems? I would like to see use cases and examples.

mattgarrish commented 6 years ago

Isn't this achieved by epub:type without strictly enforcing the structural semantics vocabulary?

It's not a wise thing to promote given its limited lifespan in epub 3, though.

TzviyaSiegman commented 6 years ago

I would really love to be able to speak to the people who gave this feedback. What semantics are needed. I agree with @mattgarrish on this point. epub:type is there for now. I don't encourage relying too heavily on it because it won't be there for long.

RachelComerford commented 6 years ago

@BillKasdorf - this was from your survey response. Can you provide more context for us based on @mattgarrish and @TzviyaSiegman contributions?

JayPanoz commented 6 years ago

Pinging @acabal since I know they’ve been using their own vocab (se) for Standard Ebooks, so he can probably give use cases, details, etc.

acabal commented 6 years ago

Hi folks, I'm not quite sure what this Github issue is referring to specifically since I'm not part of this group; but since Jay asked I'll give a brief rundown of how we're using epub:type at Standard Ebooks.

epub:type gives us a few ways to semantically enriching the ebooks we produce. We use a mixture of vocabularies: first, the epub3 structural semantics vocabulary; if what we want to mark up isn't in there, we pick from the z3998 structural semantics vocabulary; and if that too is missing what we need, we have our own small home-grown vocabulary.

The reason behind semantically enriching these ebooks is threefold.

  1. By adding semantics to markup that is already required to present the ebook, we open the door to doing interesting machine parsing of big ebook corpuses.

    For example, in our typography manual we specify that the names of naval vessels must appear in italics. This generally requires an <i> tag. Since we're already using <i>, we give that tag the attribute of epub:type="se:name.vessel.ship" to include more information on what it is we're marking up here.

    In general that is not useful markup in the sense that it changes the presentation. However it gives us the ability to go back and do machine parsing of ebooks with ship names in them.

    For example, what if in the future, we want to change our entire corpus so that ship names are quoted, and not italicized? Including this semantic information allows us to script that kind of change, instead of having to do cover-to-cover rereads of thousands of ebooks.

    Or what if we wanted to query all the references of the ship "Queen Mary", but without getting lots of false positives for historical or fictional queens of the same name? We can do that with these kinds of semantics.

  2. Smart and accurate semantics allow reading systems to better present book basics. For example, marking up each epub:type="chapter" allows the reading system to put a page break at the end of each one. Or, marking up epub:type="endnote" can tell the reading system to show a popup endnote taken from the section with epub:type="endnotes".

    It also allows us to do really cool stuff for accessibility and night mode. For example, we often have illustrations that are simple black-only drawings on a transparent background. (Think of a map in a fantasy book, or a layout of a murder scene in a Christie novel.) By marking them up with epub:type="se:color-depth.black-and-transparent", we tell the reading system that this image is suitable to be inverted to white-on-transparent when night mode is turned on. Now, no reading system actually does this in practice, but the hook is there for them! Currently we find that semantic in our build process and insert some custom CSS to make inversion happen on certain ereaders.

    An example that touches on accessibility is including epub:type="z3998:roman", giving text-to-speech systems a hook for reading out Roman numerals as numbers instead of as letters. Again, nobody actually does this yet, but the hook is there for the ambitious TTS system to take advantage of.

  3. Allows styling ebooks based on a standardized semantic vocabulary, instead of arbitrary CSS classes. For example, if we wanted to style a poem, we could write the HTML like so: <blockquote epub:type="z3998:poem"> and then target it in CSS with [epub|type~="z3998:poem"]. Not only did we semantically enrich the text, but we were able to style it without polluting the HTML with meaningless style hooks like class="poem". Since we get a CSS style hook for free when we include semantics, we get to practice DRY and keep the ebook source code clean and simple.

Our own vocabulary is small and it was always meant to be a temporary solution until we switched to a more robust and standard one like schema.org. This hasn't happened yet mainly due to lack of time.

On a side note I'm very saddened to hear that epub:type is being dropped in future revisions of epub. It provides an elegant way to mark up text that has lots of benefits, as I outlined above. Removing it makes ebooks dumber, less information-dense, more difficult to parse, and more difficult to render. If reading systems didn't make consistent use of it, then that doesn't mean it was a bad idea, or that they couldn't in the future.

I'm available to answer any questions anyone has. Hopefully some of this is useful to you!

danielweck commented 6 years ago

Useful info, @acabal :) For reference: https://idpf.github.io/epub-guides/aria-mapping/

RachelComerford commented 6 years ago

Thank you @acabal! This group is working on establishing "Best Practices" documentation for EPUB3 use. (To answer your question about context.)