w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

Duration of an audiobook #307

Closed HadrienGardeur closed 6 years ago

HadrienGardeur commented 6 years ago

For an audiobook, we need to ideally express:

Schema.org uses https://schema.org/duration for this purpose but there are a few points worth listing:

For the duration of individual resources listed in the reading order, the same issue raised in https://github.com/w3c/wpub/issues/229#issuecomment-413520551 applies as well.

danielweck commented 6 years ago

Thanks for pointing this out.

Indeed, I am discovering the "duration" ISO_8601 syntax: https://en.wikipedia.org/wiki/ISO_8601#Durations

I am more familiar with Media Fragments: https://www.w3.org/TR/media-frags/#naming-time

...and of course SMIL (well, the EPUB3 Media Overlays simplified model): http://www.idpf.org/epub/31/spec/epub-mediaoverlays.html#app-clock-examples https://www.w3.org/TR/SMIL3/smil-timing.html#Timing-ClockValueSyntax

llemeurfr commented 6 years ago

You're right @HadrienGardeur, the ISO 8601 syntax for a duration is un-intuitive (but easy when you look closely) and I didn't see it used in the read world (e.g. real life XML schemas).

We must consider that this duration property is essentially used for display purposes (in the UA), therefore the second is a sufficient precision and an audiobook duration will never be much more than 24 hours. A duration is minutes or seconds would therefore be ok.

I'm sligthly in favor of reusing the SMIL timecount syntax (e.g. 45min, or 30s). Easy to write, to read and to parse. I understand this is a fork from schema.org. Nb: using the ISO 8601 syntax we would have values like PT45M or PT30S.

Apart from the global duration of the audiobook (which will appear along with the description of the work), my feeling is that the duration of individual "chapters" should be expressed at the level of the ToC rather than the level of individual (physical) resources.

iherman commented 6 years ago

Is ISO 8601 very different from the xsd datatype for duration? The advantage of xsd is that it is integral part of the environment...

llemeurfr commented 6 years ago

@iherman no it's the same. Did you see xsd:duration used in well know XML schemas?

iherman commented 6 years ago

I have not really, but simply because I never really looked for it, ie, my opinion is not really relevant...:-(

iherman commented 6 years ago

All that being said, I am a little bit wary of using something different than schema. That would mean that we would have to use a different property than https://schema.org/duration, ie, that type of data in our metadata will not be necessarily recognized...

(But I do not have experience in this, ie, I do not have a strong opinion on this.)

HadrienGardeur commented 6 years ago

Apart from the global duration of the audiobook (which will appear along with the description of the work), my feeling is that the duration of individual "chapters" should be expressed at the level of the ToC rather than the level of individual (physical) resources.

I think these are two different things.

At a ToC level, we can point to a range using media fragments, but IMO that's mostly useful for a "timeline" feature or displaying the relevant title when listening to an audiobook.

llemeurfr commented 6 years ago

At a ToC level, we can point to a range using media fragments, but IMO that's mostly useful for a "timeline" feature or displaying the relevant title when listening to an audiobook.

If we stay at a functional level, we should indicate at the Toc level a duration that the user can easily understand (after some processing for proper display), so that he can choose if he has enough time to listen to the chapter. It's an indication of "volume". I agree that such information should also be usable in a timeline. If the ToC is expressed in HTML as it is today, this can be inserted using microdata (again). Does a media fragment makes the trick in this case? I doubt.

llemeurfr commented 6 years ago

I think these are two different things.

Maybe. What is the interest of a duration at the (physical) resource level? there is no benefit for the user IMO. There may be a benefit for the UA: which one?

What is sure is that the duration of the whole audiobook (good for the user) and the duration of individual resources are not on the same plan.

HadrienGardeur commented 6 years ago

Maybe. What is the interest of a duration at the (physical) resource level? there is no benefit for the user IMO. There may be a benefit for the UA: which one?

That second point is something that we really need to explore by the way. Without indications such as:

... it'll be very difficult for a UA when it should or shouldn't cache something for offline reading or download it in a package.

The size of the cache will always be limited by:

HadrienGardeur commented 6 years ago

If we stay at a functional level, we should indicate at the Toc level a duration that the user can easily understand (after some processing for proper display), so that he can choose if he has enough time to listen to the chapter. It's an indication of "volume". I agree that such information should also be usable in a timeline. If the ToC is expressed in HTML as it is today, this can be inserted using microdata (again). Does a media fragment makes the trick in this case? I doubt.

That's not what I'm suggesting at all, I don't want to see microdata in the ToC (or anywhere frankly).

Here's an example of an audiobook ToC:

llemeurfr commented 6 years ago

About media characteristics

without ... it'll be very difficult for a UA when it should or shouldn't cache something for offline reading or download it in a package.

Apart from the size of the resource, why are the characteristics you are listing useful for caching? IMO media characteristics are only useful when one needs 1/ to check that the UA can read the content (this is the case for the audio codec value) 2/ to select one piece of content among several alternatives.

resolution (images and audio) The common meaning of resolution is "the recommended printing resolution for an image in dots per inch" -> raster images only, e.g 72 dpi.

Let's discuss that in #308.

About the duration of a section in an audiobook ToC

With a simple time, the UA cannot easily infer the duration of the section. With a time interval it is possible with some processing.

Question is:

HadrienGardeur commented 6 years ago

Apart from the size of the resource, why are the characteristics you are listing useful for caching? IMO media characteristics are only useful when one needs 1/ to check that the UA can read the content (this is the case for the audio codec value) 2/ to select one piece of content among several alternatives.

This is different from what I've pointed out in https://github.com/w3c/wpub/issues/307#issuecomment-413821054.

Even if you support the format, you might not want to attempt caching a 400Mb video when your overall available cache is 50Mb (I think that's the total cache available in Safari for example).

With a simple time, the UA cannot easily infer the duration of the section. With a time interval it is possible with some processing.

That's not entirely true. Take a look at the Flatland example and you'll see that it's very easy to extract a timeline from it.

do we want to impose a time interval here?

IMO that's not necessary.

danielweck commented 6 years ago

I think promoting the use of time intervals instead of single time pointers for the "table of contents" links is a bad idea, because ranges clearly indicate when playback ends. TOC links conventionally reference discrete locations to navigate to, which provides a consistent user expectation whereby whenever such a link is activated, playback (i.e. the reading / listening experience) continues until the end of the audio book, unless otherwise interrupted.

This applies to "continuous" media (as per the SMIL definition), which is not just "audio books", but also text-only publications that are narrated via synthetic speech (thereby providing a flowing aural rendition), as well as synchronized text + pre-recorded audio (EPUB3 Media Overlays).

PS: by "table of contents" I really mean any navigational structure that references locations within the publication resources, such as hierarchical list of headings, flat list(s) of landmarks, page list, etc.

HadrienGardeur commented 6 years ago

Going back to the initial issue, these are the two questions that we want to address in our next call:

Once we've answered both questions, we can decide how (keeping in mind that https://schema.org/duration is the most likely candidate).

ghost commented 6 years ago

As RS we probably need to express duration of the whole publication in for example book shelf and individual resources in "toc".

HadrienGardeur commented 6 years ago

As RS we probably need to express duration of the whole publication in for example book shelf and individual resources in "toc".

In the TOC rather than in readingOrder? Why?

ghost commented 6 years ago

It is because we put chapter duration in toc level of RS affordance (of course data of duration could be eventually existed in readingOrder) and at this moment my understanding is we have not decided if RS should treat readingOrder as toc of whole book.

HadrienGardeur commented 6 years ago

They're two completely different things. To illustrate with the Flatland example:

ghost commented 6 years ago

Ok, I was confused by the readingOrder include some information ("name": "Part 1, Sections 1 - 3") seems needed for toc. Thanks for clarification. Then we need this in toc rather than readingOrder since it would be better for UI to display duration in Table Of Content (duration for each item in toc) because item in toc could be sub items of readingOrder. I don't think it's practical to calculate duration of each toc item from readingOrder.

HadrienGardeur commented 6 years ago

Hmm I disagree.

All audiobooks produced today have the equivalent of a readingOrder and it would be very easy to provide the duration of such tracks.

It's not clear how many audiobooks have the equivalent of a TOC right now. I also don't think that a TOC will ever be a requirement in WP.

By requiring a duration in readingOrder and recommending media fragments in the TOC:

ghost commented 6 years ago

and it would be very easy to provide the duration of such tracks.

I think this is base on presumption that TOC is machine readable but we are not sure about it yet in wide WP scope.

All audiobooks produced today have the equivalent of a readingOrder

What if some audiobook could be voice controlled by google home in near future?

I also don't think that a TOC will ever be a requirement in WP.

Neither me. I can not say duration is requirement as well. I would rather that spec can provide a (standard) way to express duration than make it as a requirement just like toc. Base on this, maybe as you said readingOrder is an alternative way (and better way eventually?) but think about it straightforwardly RS would need duration to be displayed corresponding with toc item. Is there any reason we need to put it into readingOrder and calculate it to toc rather than just put it in toc?

Maybe we can extend this a bit by

  1. do we need to express the duration of the whole publication?
  2. do we need to express the duration of individual resources?
  3. do we need to express the duration of individual toc item?

I agree with 1, thinking it would be nice if we have 3 but I wonder what 2 can provide rather than toc.

wareid commented 6 years ago

Summary from the taskforce meeting (for the benefit for the rest of the group): There are two options to express duration (and many places where duration can be expressed), one is to use the schema.org schema:duration (Duration - schema.org) which requires including the “@type” of Audiobook and the time format from ISO 8601. The other option is to include duration in the resources, which would require an extension of the publicationLink we are already using. The importance of how we provide information comes up in a few ways for UserAgents:

iherman commented 6 years ago

Thanks @wareid for the summary. Just few very general remarks.

HadrienGardeur commented 6 years ago

My absolutely and completely layperson's comment is whether the structures we are discussing here are valid for audiobooks only. I could imagine, for example, a 'publication' consisting of a series of videos [...]

That's entirely correct and ideally we want to address this by potentially introducing the ability to specify the bitrate (audio & video) and the height/width (video and images) as well.

Since these terms are already available at schema.org (https://schema.org/bitrate, https://schema.org/height and https://schema.org/width), this would simply require extending the publication link model with them as well.

iherman commented 6 years ago

@HadrienGardeur

That's entirely correct and ideally we want to address this by potentially introducing the ability to specify the bitrate (audio & video) and the height/width (video and images) as well.

Thanks for the confirmation. Two (not necessarily high priority) comments.

  1. Is there an accepted term in the community for a publication including, e.g., videos (I note that we may touch some of the animated manga/BD type publications with this)? audiobook is probably the most important category and type for now in practice, but we may want to be prepared for having another publication type for the more general category. This may be an issue to take up with @danbri at TPAC: extensions of schema.org to include other forms of publications.
  2. While extending the PublicationLink is the obvious way (and I am fine with it) to add these additional attributes, we may want to look at the clean semantics. Indeed, whilst PublicationLink started as some sort of a copy of a <link> element by adding media type and the like, adding, e.g., duration means that we are characterizing the targeted resource itself, and not the link (knowing very well that the borderline between these two are fuzzy). This may lead to two possible changes:
    1. Rename PublicationLink to something like a PublicationResource (I know, this is only a change of name, but those are important).
    2. If we say PublicationResource we might want to consider the term id instead of url (or have both), because we want to have the subject clearly stated for the statements we are adding. (Yeah, my Semantic Web past is resurfacing:-(

Again, these are not central issues for now, just jotting them down for a possible later discussion. And if there is consensus that this is worthwhile discussing, I am happy spawn these into a separate issue(s).

Cc @TzviyaSiegman @GarthConboy @wareid

iherman commented 6 years ago

This issue was discussed in a meeting.

llemeurfr commented 6 years ago

A note about the transcript of @BigBlueHat comment: the HTML time element supports either a date (using the W3C datetime format, which is a subset of the ISO 8601 format) or a duration. If it represents a duration, it can be formatted either using the ISO 8601 duration format or using a W3C custom format ("more readable" says the spec).

See https://www.w3.org/TR/2014/REC-html5-20141028/infrastructure.html#valid-duration-string more reading samples in e.g. https://developer.mozilla.org/fr/docs/Web/HTML/Element/time. This means that "2 hours 15 mn" can be encoded either as PT2H15M or 2h 15m.

Embracing the web would mean adopting this w3c syntax for durations, not the ISO syntax strictly speaking. This also means the schema.org is more restrictive than the w3c on the syntax of a duration.

iherman commented 6 years ago

@llemeurfr I have added this to https://github.com/w3c/wpub/wiki/Schema.org-issues. Hopefully we can discuss these at TPAC.

@wareid @TzviyaSiegman @GarthConboy

HadrienGardeur commented 6 years ago

Also FYI, in Readium we've decided to convert everything to seconds instead since that's what we're already using for:

It feels easier to always work with seconds rather than juggle constantly between ISO 8601 for the primary reading order/metadata and seconds for everything else.

iherman commented 6 years ago

This issue was discussed in a meeting.

danielweck commented 6 years ago

Media Fragments URI ( https://www.w3.org/TR/media-frags/#naming-time ) normatively references RFC 2326 ( http://www.ietf.org/rfc/rfc2326.txt ), but note RFC 7826 https://tools.ietf.org/html/rfc7826 So we need to figure-out how to align here.

iherman commented 6 years ago

This issue was discussed in a meeting.

iherman commented 5 years ago

This issue was discussed in a meeting.