readium / readium-js-viewer

👁 ReadiumJS viewer: default web app for Readium.js library
BSD 3-Clause "New" or "Revised" License
550 stars 186 forks source link

[QUESTION] Status of user-selectable Media Overlays granularity #510

Open pettarin opened 8 years ago

pettarin commented 8 years ago

If I read the source code correctly, Readium JS Viewer supports user-selectable MO granularity by looking at epub:type on <seq> elements in the SMIL files.

Specifically, it looks for paragraph, sentence, and word values.

Now, if you try to validate with the latest EpubCheck a SMIL file with such attributes, e.g.:

<smil xmlns:epub="http://www.idpf.org/2007/ops" xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <body>
    <seq id="seq000001" epub:textref="p001.xhtml">
      <seq epub:type="paragraph" epub:textref="p001.xhtml#p000001">
        <seq epub:type="sentence" epub:textref="p001.xhtml#p000001s000001">
          <seq epub:type="word" epub:textref="p001.xhtml#p000001s000001w000001">
            <par>
              <text src="p001.xhtml#p000001s000001w000001"/>
              <audio src="p001.mp3" clipBegin="00:00:00.000" clipEnd="00:00:02.640"/>
            </par>
          </seq>
        </seq>
      </seq>
      <seq epub:type="paragraph" epub:textref="p001.xhtml#p000002">
        <seq epub:type="sentence" epub:textref="p001.xhtml#p000002s000001">
          <seq epub:type="word" epub:textref="p001.xhtml#p000002s000001w000001">
            <par>
              <text src="p001.xhtml#p000002s000001w000001"/>
              <audio src="p001.mp3" clipBegin="00:00:02.640" clipEnd="00:00:02.895"/>
            </par>
          </seq>
...

you will get:

ERROR(OPF-027): output/sonnet.smil/output/sonnet.smil(4,68): Undefined property: 'paragraph'.
ERROR(OPF-027): output/sonnet.smil/output/sonnet.smil(5,76): Undefined property: 'sentence'.
ERROR(OPF-027): output/sonnet.smil/output/sonnet.smil(6,81): Undefined property: 'word'.

since paragraph, sentence, and word are not in the EPUB 3 Structural Semantics Vocabulary.

So, my questions are:

  1. Does EpubCheck make a mistake reporting the above errors?
  2. Is the user-selectable MO granularity an "experimental" feature in Readium and hence it should be considered as instable/not available yet? In particular, is the paragraph/sentence/word detection by epub:type an (Readium-)internal convention?
  3. Is there a "blessed" way to produce/support multi-granularity SMIL files? Are there any "official" examples available? (See also: https://github.com/IDPF/epub-revision/issues/650 )
danielweck commented 8 years ago

This feature is experimental, i.e. not part of the EPUB specification. EpubCheck seems to correctly report the errors.

Based on my tests with prototype publications (including all 3 levels of granularity: word, sentence, paragraph), the implementation in Readium is robust. I saw inventive use of CSS styling to visually highlight word-level audio playback, while the granularity was at sentence or paragraph level (with the advantage of the Media Overlays engine turning pages mid-paragraph in cases where a block of text spans across a page boundary). I cannot share this EPUB content, and to be honest I am not even sure that this feature is actually used anywhere other than in a research lab :) One thing is for sure: there was not enough interest in multiple audio-DOM synchronization granularity to drive an update of the EPUB Media Overlays specification.

Side note: Readium also supports CFI character ranges (which negate the need for heavyweight DOM spans + fragment identifiers), but this is even more experimental. The implementation relies on complex DOMRange highlights, which is sufficiently performant but not as reliable as plain fragment id CSS selectors. I hand-crafted a demo EPUB to showcase CFI + multiple granularities, but the two features are independent.

pettarin commented 8 years ago

Thank you for the clarification. No example needed, do not worry.

Just two quick notes:

  1. next version of aeneas (v1.5.0) will be able to output multi-level sync maps;
  2. right now I am pro-bono consulting for an association working with visually impaired and dyslexic kids, and they are very interested in multi-level granularity, selectable by the user --- to the extent they are lobbying me into supporting it in Menestrello app (but I cannot confirm I will code that soon).

Feel free to close the issue, unless you deem it useful.

pettarin commented 8 years ago

Just one more note: having, say, sentence and word timings, one can also display the MO highlighting like a karaoke highlighting: instead of flashing one word at a time, "grow" the highlighted text, adding one word at a time, and then reset when transitioning to the next sentence. The latter approach is more adequate on not-too-slow audio. Not a coincidence Kindle Immersion Reading adopts it.