Packaging for audiobooks

HadrienGardeur commented 6 years ago

As a follow-up to our discussions at TPAC, I'd like to submit a first proposal for what could become the packaging format for audiobooks:

all resources (including the manifest) are packaged together in a ZIP (a lighter take on OCF)
the audio resources should not be further compressed in the ZIP
the manifest has a well-known location at the root of our package: manifest.jsonld
the entry page has a well-known location as well: index.html
we drop the requirement for an entry page and its reference in the manifest
all resources contained in the package that are not listed under readingOrder in our manifest are considered part of the resource list
we define a dedicated media type (TBD) and file extension (TBD as well) to identify such packages, both of them would be specific to audiobooks only

GarthConboy commented 6 years ago

With quick review, this looks like a very good starting point to me.

llemeurfr commented 6 years ago

An intriguing part of the proposal is that the entry page is not required for audiobooks. But the ToC and other navigation lists are currently only defined in this entry page.

Where will they be defined then? in the manifest as as machine readible ToC?

HadrienGardeur commented 6 years ago

But the ToC and other navigation lists are currently only defined in this entry page.

That's not the case. They're both identified in the manifest and can be included in other resources.

llemeurfr commented 6 years ago

They're both identified in the manifest and can be included in other resources.

True. We can add therefore a feature of the packaging format for audiobooks:

the ToC and other landmarks a included in one or more html resources and referenced as resources from the manifest.

HadrienGardeur commented 6 years ago

@llemeurfr but that's not specific to audiobooks or packaging them, that's why I don't think it's worth listing.

HadrienGardeur commented 6 years ago

I'd like to upload an example but unfortunately most audiobooks are too large for our repo. Any suggestions how we should deal with that issue?

cc @iherman @GarthConboy @wareid

GarthConboy commented 6 years ago

Back briefly to the TOC question. Yes, the TOC can be in a non-reading-order resource and referenced from the manifest -- this should work fine. However, it seems we may want some way to identify said resource as ONLY the the TOC (or allow the TOC to be encoded in the manifest), such that the UA/RS knows that said TOC-resource is not really an ancillary resource to be side-presented with the audio, it's just the machine processable TOC.

In practice, said side-presented resources (supplemental content) will likely be PDF's, but I'm not sure that type should be the key to identification.

lrosenthol commented 6 years ago

I strongly recommend against raw ZIP. There are a number of well known problems with it, which is why all standard ZIP-based packages start by addressing them. Since we don't want to recreate the wheel - I suggest two possible starting points.

Update OCF to remove the known issues (eg. the required mime-type file)
Use another common standard for packaging. In this case, I recommend OPC (https://en.wikipedia.org/wiki/Open_Packaging_Conventions).

GarthConboy commented 6 years ago

Yea, OCF without mimetype (if we're really mad at it) -- to get the charset and file path "fixes" -- would be fine.

dauwhe commented 6 years ago

How much benefit do we get from using one packaging mechanism for EPUB3, a second packaging mechanism for Audiobooks, and a third packaging mechanism for the packaged version of WP? Can we just use OCF until we figure this out for everything?

GarthConboy commented 6 years ago

Touché -- I have to say I'm less mad at mimetype than others. :-)

HadrienGardeur commented 6 years ago

@GarthConboy

Yes, the TOC can be in a non-reading-order resource and referenced from the manifest -- this should work fine. However, it seems we may want some way to identify said resource as ONLY the the TOC [...]

Well, we already have a rel value to indicate that the resource contains the TOC. If we go down the dual-approach for the TOC that I've suggested in #350, it will be even more clear that this is a document primarily meant to be processed rather than rendered.

[...] (or allow the TOC to be encoded in the manifest)

That's a different story altogether. We could use JSON of course, but I would advise against doing that just for audiobooks.

If you'd like to illustrate the difference:

here's a Readium manifest, where the TOC is embedded in JSON
same example where the WP manifest is separate from the machine-readable TOC

HadrienGardeur commented 6 years ago

@lrosenthol @dauwhe aside from the restrictions on file names, could you list the other benefits of using OCF?

We clearly don't need the mimetype file or META-INF/container.xml, yet they're both required in OCF.

llemeurfr commented 6 years ago

You'll find here an ISO standard which specifies a zip profile that could certainly do what we need. Or being the base for a profile we can define (re. filenames constraints) as compatible with OCF, without the XML part.

It explicitly references EPUB OCF in a section about file names and interoperability (annex B).

llemeurfr commented 6 years ago

Note that an alternative is to define and "OCF light", keeping only the OCF Zip Container (section 4), but removing the mediatype file section (4.3), keeping also the File Names section (3.4).

I like the Signature feature, but it may belong to another specification. Or we may decide to keep it also in such "OCF light".

lrosenthol commented 6 years ago

I was not recommending OCF - I recommended OPC.

However, creating an "OCF light" (or simply updating OCF!) would also be fine with me. As you note, the main issues is removing (or more specifically deprecating and/or making optional) the mediatype. You need all the stuff about filenaming - UTF8, restricted chars, etc.

Signatures are good and we should keep them.

This way, such a package (to @dauwhe's concerns) is compatible with EPUB 3.

On Thu, Oct 25, 2018 at 5:11 PM L. Le Meur notifications@github.com wrote:

Note that an alternative is to define and "OCF light", keeping only the OCF Zip Container (section 4), but removing the mediatype file section (4.3), keeping also the File Names section (3.4).

I like the Signature feature, but it may belong to another specification. Or we may decide to keep it also in such "OCF light".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-433089555, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNaapNPFcyZFwghZ98RD4HtrEq5guks5uodSLgaJpZM4X2INH .

llemeurfr commented 6 years ago

Whatever the choice is for the editing of this spec, we should have a way to validate such packaging. I'm sure they are useful pieces in epubcheck. But are there other pieces of code that would be helpful?

iherman commented 6 years ago

I am uneasy about some aspects of the proposal. In our terminology a Web Audiobook is a special Web Publication, and a packaged version thereof is a special version EPUB4 or PWP (whatever the terminology we use, let us forget about that issue for the moment). Viewing it this way, this proposal sets a precedence that may, on long term, unduly influence how a future packed version of a WP may be. What I find questionable are:

the manifest has a well-known location at the root of our package: manifest.jsonld

we drop the requirement for an entry page and its reference in the manifest

We essentially throw away what I consider to be an essential element of flexibility we have in a Web Publication, creating a fairly strong bifurcation in our specs. After all, I could (maybe naïvely) imagine an audiobook consisting of an HTML file containing a TOC, whose entries are a series of HTML audio elements...

iherman commented 6 years ago

@HadrienGardeur

I'd like to upload an example but unfortunately most audiobooks are too large for our repo. Any suggestions how we should deal with that issue?

How big? Can't you put it somewhere on the cloud with a stable URL? If necessary, I can push it up on the W3C web site (but if it is big, I would have to do it while I am at the institute with a big enough bandwidth).

HadrienGardeur commented 6 years ago

We essentially throw away what I consider to be an essential element of flexibility we have in a Web Publication, creating a fairly strong bifurcation in our specs.

@iherman

Sorry Ivan, but I have to strongly disagree with you here. In a package, we always need to have at least one well-known location. How is that throwing away an element of flexibility?

There's a big difference between dropping the requirement for an entry page and saying that it's actually forbidden. If you still want an entry page in your packaged publication, you'd be allowed to do that.

The entry page is primarily meant to:

provide a fallback for non WP-aware UAs
discover the publication (through the presence of a manifest)

In the case of packaged publications, we don't need such things IMO.

After all, I could (maybe naïvely) imagine an audiobook consisting of an HTML file containing a TOC, whose entries are a series of HTML audio elements...

Is there anything in the proposal restricting you from doing that? I don't think so.

HadrienGardeur commented 6 years ago

After discussing briefly with @iherman, it seems that he's more comfortable with having a well-known location for both:

the manifest: manifest.jsonld
and the entry page: index.html

This would make it easier to create "single resource in the reading order" publications where the manifest is embedded in index.html.

This doesn't really change my mind about making the entry page optional rather than required but I think it's a good compromise overall.

llemeurfr commented 6 years ago

Re. an entry page, optional, as index.html: I join such a compromise.

iherman commented 6 years ago

It is not a compromise, it is a consensus:-)

GarthConboy commented 6 years ago

I just thumbs up-ed the above... with the view that the entry page would be optional at least for audiobooks... just checking, is that the consensus?

HadrienGardeur commented 6 years ago

I've tweaked the first post and added the well-known location for the entry page as well, this way we have a full list for the proposal which could be discussed in a future WG call.

iherman commented 6 years ago

@HadrienGardeur just to be clearer:

we drop the requirement for an entry page and its reference in the manifest

the entry page, if present, must have the same structure than in the WP, ie, it must have a reference to the manifest, or may embed it. What is proposed to be dropped is the requirement for the very existence of the entry page, not its structure.

HadrienGardeur commented 6 years ago

@iherman

I'm certainly not suggesting that the entry page should be structured differently.

What I'm saying is that:

we don't need to have the url term in the manifest for a packaged publication (this affects the JSON Schema for the manifest)
we don't need to have an entry page itself in the packaged either

HadrienGardeur commented 6 years ago

How big? Can't you put it somewhere on the cloud with a stable URL? If necessary, I can push it up on the W3C web site (but if it is big, I would have to do it while I am at the institute with a big enough bandwidth).

@iherman

I'd rather upload the example somewhere in the cloud that's not tied to any of my personal accounts, since someone else than me might need to update it.

The packaged version of Flatland should be roughly 240-250 Mb.

iherman commented 6 years ago

@HadrienGardeur that is fine, but at least temporarily you will have to put it somewhere on the cloud, because I would expect email clients to have problems with such an attachment. Once I get hold of the file, I can push it up on W3C at some www.w3.org/2018/11/XXX URL, which can be then changed later (by my or some other team member) if necessary.

danielweck commented 6 years ago

Besides suitable storage / bandwidth, my minimal requirements for hosting sample Web Publications would be:

HTTP CORS: Access-Control-Allow-Origin = * any origin, and if possible Access-Control-Allow-Methods with HEAD (and GET obviously) so that a reading system can get basic info before issuing a GET request to fetch / incrementally stream the response payload, and also Access-Control-Allow-Headers + Access-Control-Expose-Headers with useful HTTP headers such as Content-Type, Content-Length, Accept-Ranges, Content-Range, Range, Link, Transfer-Encoding ...
HTTP 1.1 partial byte range requests (for "streaming" audio/video resources)

iherman commented 6 years ago

@danielweck you are raising a more general issue. Do we want to establish a storage for sample Web Publications in general? If so, I would have to look for a dedicated URL rather than the catch-for-all /2018/11/ bin of the W3C web space.

I do have the possibility to set .htaccess files for CORS on w3.org, so that should be o.k., provided somebody provides me with the correct statements. I must admit I do not know whether our server does that partial request for HTTP 1.1; a question to our system guys...

(B.t.w., github would not give these possibilities, even if the limit was not 100MB as it is now.)

danielweck commented 6 years ago

Yes, GitHub's gh-pages (or any branch mapped as "publishing source") only offers basic static hosting, thus why people have been using CDN proxies like https://rawgit.com (now deprecated), https://www.staticaly.com/rawgit , https://raw.githack.com , https://gitcdn.link etc.

Could it be that only the large audio/video files need to be hosted some place else? It would be nice if other resource types in sample Web Publications (e.g. JSON manifest, HTML, CSS, Javascript, etc.) could be tracked in Git, just like regular source code.

PS - just out of interest, I checked the HTTP headers provided by the various aforementioned CDN proxies, when requesting an MP3 file from the IDPF EPUB3 samples:

https://github.com/IDPF/epub3-samples/blob/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3

=>

curl -I -X GET -L https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://cdn.staticaly.com/gh/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://raw.githack.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://gitcdn.link/repo/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://rawgit.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC (deprecated)

...all of them provide accept-ranges: bytes and access-control-allow-origin: *, but no sign of the other nice-to-have CORS headers mentioned in my previous message. So yeah, being able to control this with .htaccess is a bonus :)

iherman commented 6 years ago

On 5 Nov 2018, at 17:48, Daniel Weck <notifications@github.com mailto:notifications@github.com> wrote:

Yes, GitHub's gh-pages (or any branch mapped as "publishing source") only offers basic static hosting, thus why people have been using CDN proxies like https://rawgit.com https://rawgit.com/ (now defunct), https://www.staticaly.com/rawgit https://www.staticaly.com/rawgit , https://raw.githack.com https://raw.githack.com/ , https://gitcdn.link https://gitcdn.link/ etc.

Could it be that only the large audio/video files need to be hosted some place else? It would be nice if other resource types in sample Web Publications (e.g. JSON manifest, HTML, CSS, Javascript, etc.) could be tracked in Git, just like regular source code.

That should certainly be the case for WP examples. But Hadrien's one is an example for a packaged audiobook…

Ivan

PS - just out of interest, I checked the HTTP headers provided by the various aforementioned CDN proxies, when requesting an MP3 file from the IDPF EPUB3 samples:

https://github.com/IDPF/epub3-samples/blob/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 https://github.com/IDPF/epub3-samples/blob/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 =>

curl -I -X GET -L https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://cdn.staticaly.com/gh/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 https://cdn.staticaly.com/gh/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://raw.githack.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 https://raw.githack.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://gitcdn.link/repo/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 https://gitcdn.link/repo/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC

curl -I -X GET -L https://rawgit.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 https://rawgit.com/IDPF/epub3-samples/master/30/cc-shared-culture/EPUB/audio/asharedculture_soundtrack.mp3 | grep -i ACC (deprecated)

...all of them provide accept-ranges: bytes and access-control-allow-origin: *, but no sign of the other nice-to-have CORS headers mentioned in my previous message. So yeah, being able to control this with .htaccess is a bonus :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-435946991, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfyE7runbwMClt9XNmI0eOUE13-Pav2ks5usGvcgaJpZM4X2INH.

Ivan Herman, W3C Publishing@W3C Technical Lead Home: http://www.w3.org/People/Ivan/ http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: https://orcid.org/0000-0003-0782-2704 https://orcid.org/0000-0003-0782-2704

iherman commented 6 years ago

@HadrienGardeur, your example file is publicly available at:

https://www.w3.org/2018/audiobook_examples/flatland.audiopub

HadrienGardeur commented 6 years ago

Thanks @iherman for the upload!

For the record, here's the content of that file:

unzip -v flatland.audiopub 
Archive:  flatland.audiopub
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
    1650  Defl:N      522  68% 11-06-2018 11:43 4c434f37  manifest.jsonld
    5011  Defl:N      936  81% 09-20-2018 18:48 6a7ad6da  toc.html
   96193  Defl:N    79951  17% 11-05-2018 15:22 0a773389  cover.jpg
21948718  Stored 21948718   0% 11-05-2018 15:24 b29de8be  flatland_1_abbott.mp3
26706222  Stored 26706222   0% 11-05-2018 15:24 ed3ef3d7  flatland_2_abbott.mp3
24105262  Stored 24105262   0% 11-05-2018 15:24 a9bf2144  flatland_3_abbott.mp3
28776750  Stored 28776750   0% 11-05-2018 15:24 29d90755  flatland_4_abbott.mp3
19605806  Stored 19605806   0% 11-05-2018 15:25 77346375  flatland_5_abbott.mp3
26558766  Stored 26558766   0% 11-05-2018 15:25 fdd624a1  flatland_6_abbott.mp3
34345262  Stored 34345262   0% 11-05-2018 15:25 92c51d09  flatland_7_abbott.mp3
42600750  Stored 42600750   0% 11-05-2018 15:25 99a505ee  flatland_8_abbott.mp3
18837806  Stored 18837806   0% 11-05-2018 15:25 5a13a4d9  flatland_9_abbott.mp3
--------          -------  ---                            -------
243588196         243566751   0%                            12 files

danielweck commented 6 years ago

Typo in .htaccess? accept-language: bytes should be accept-ranges: bytes, I think.

Also, there is no access-control-allow-origin: * header.

curl -I -X GET https://www.w3.org/2018/audiobook_examples/flatland.audiopub

HTTP/2 200 
date: Tue, 06 Nov 2018 10:35:35 GMT
last-modified: Tue, 06 Nov 2018 09:19:33 GMT
etag: "e8490c3-579fb7fe4b340"
accept-language: bytes
content-length: 243568835
cache-control: max-age=21600
expires: Tue, 06 Nov 2018 16:35:35 GMT
strict-transport-security: max-age=15552000; includeSubdomains; preload
content-security-policy: upgrade-insecure-requests

iherman commented 6 years ago

@danielweck I did not set the .htaccess at all, so this is whatever the directory inherits from the default setup.

If you can give me a .htaccess content that we would like to have, I would appreciate it... (and use it:-)

HadrienGardeur commented 6 years ago

@danielweck I think you're raising an important point about WP that is getting lost in this discussion.

I don't know if the specifics about CORS and range requests should show up in our spec or in a best practice document, but we definitely need them somewhere.

Would you mind opening a new issue specifically about that?

HadrienGardeur commented 6 years ago

ISO has already standardized what we need for "OCF light" which means that we can simply leverage that: http://standards.iso.org/ittf/PubliclyAvailableStandards/c060101_ISO_IEC_21320-1_2015.zip

Credits to @llemeurfr for identifying that document.

GarthConboy commented 6 years ago

Hmmm... interesting re "OCF light" and ISO. We'd need to define a known name for the manifest file, then that may be all we need (well plus likely a file extension and MIME type).

HadrienGardeur commented 6 years ago

Hmmm... interesting re "OCF light" and ISO. We'd need to define a known name for the manifest file, then that may be all we need (well plus likely a file extension and MIME type).

Sounds like a 2 pages long spec to me (the whole ISO thing being a 10 pages long document in the first place).

wareid commented 6 years ago

As discussed in Audio TF on Nov 16th, I've added this to the queue to be discussed in the main PWG call in the coming weeks RE: the implications of potentially introducing a new packaging format to WP.

lrosenthol commented 6 years ago

I would strongly recommend against using ISO 21320 for your package normative reference for three main reasons.

1 - It doesn't properly address various well known file naming situations (eg. proper Unicode and platform incompatibilities) which OCF/UCF do. 2 - It disallows encryption, which would not be good for those publishers requiring some form of DRM 3 - It disallows DigSig, which would prevent proper tamper detection.

Instead, I would recommend making the necessary changes to OCF - or Adobe would be happy to return our (licensed from IDPF) OCF-variant (called UCF) which already has the few changes you'd probably want to make to OCF anyway (eg. removing the mimetype file restrictions)

On Fri, Nov 16, 2018 at 10:30 AM Hadrien Gardeur notifications@github.com wrote:

ISO has already standardized what we need for "OCF light" which means that we can simply leverage that: http://standards.iso.org/ittf/PubliclyAvailableStandards/c060101_ISO_IEC_21320-1_2015.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-439429556, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNSqB2G40sWlM0Ke0zdW7qBr9Ps4kks5uvtomgaJpZM4X2INH .

llemeurfr commented 6 years ago

Hi Léonard,

Your second point is void: the ISO standard only disallows the encryption mechanism embedded in the Zip format, it does NOT disallow other encryption mechanisms, therefore does not disallow DRM. Same for the third point IMO.

Re. the first point, this interesting, can you detail the issue? Also, the Adobe OCF-variant may be interesting for completing OCF lite. Where can we find the spec?

Le 18 nov. 2018 à 16:22, Leonard Rosenthol notifications@github.com a écrit :

I would strongly recommend against using ISO 21320 for your package normative reference for three main reasons.

1 - It doesn't properly address various well known file naming situations (eg. proper Unicode and platform incompatibilities) which OCF/UCF do. 2 - It disallows encryption, which would not be good for those publishers requiring some form of DRM 3 - It disallows DigSig, which would prevent proper tamper detection.

Instead, I would recommend making the necessary changes to OCF - or Adobe would be happy to return our (licensed from IDPF) OCF-variant (called UCF) which already has the few changes you'd probably want to make to OCF anyway (eg. removing the mimetype file restrictions)

On Fri, Nov 16, 2018 at 10:30 AM Hadrien Gardeur notifications@github.com wrote:

ISO has already standardized what we need for "OCF light" which means that we can simply leverage that: http://standards.iso.org/ittf/PubliclyAvailableStandards/c060101_ISO_IEC_21320-1_2015.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-439429556, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNSqB2G40sWlM0Ke0zdW7qBr9Ps4kks5uvtomgaJpZM4X2INH .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-439700768, or mute the thread https://github.com/notifications/unsubscribe-auth/AOQD0vvmZfPsc8DBrN2E95ibsp3v-cWkks5uwXszgaJpZM4X2INH.

lrosenthol commented 6 years ago

You are correct - you could certainly encrypt and/or sign using alternative mechanisms inside the ZIP that don't use the native mechanism. However, doing so would introduce security holes in both (but that's another thread).

Right now, it's internal to Adobe - but I'll get clearance to distribute to this WG.

Leonard

On Sun, Nov 18, 2018 at 10:34 AM L. Le Meur notifications@github.com wrote:

Hi Léonard,

Your second point is void: the ISO standard only disallows the encryption mechanism embedded in the Zip format, it does NOT disallow other encryption mechanisms, therefore does not disallow DRM. Same for the third point IMO.

Re. the first point, this interesting, can you detail the issue? Also, the Adobe OCF-variant may be interesting for completing OCF lite. Where can we find the spec?

Le 18 nov. 2018 à 16:22, Leonard Rosenthol notifications@github.com a écrit :

I would strongly recommend against using ISO 21320 for your package normative reference for three main reasons.

1 - It doesn't properly address various well known file naming situations (eg. proper Unicode and platform incompatibilities) which OCF/UCF do. 2 - It disallows encryption, which would not be good for those publishers requiring some form of DRM 3 - It disallows DigSig, which would prevent proper tamper detection.

Instead, I would recommend making the necessary changes to OCF - or Adobe would be happy to return our (licensed from IDPF) OCF-variant (called UCF) which already has the few changes you'd probably want to make to OCF anyway (eg. removing the mimetype file restrictions)

On Fri, Nov 16, 2018 at 10:30 AM Hadrien Gardeur < notifications@github.com> wrote:

ISO has already standardized what we need for "OCF light" which means that we can simply leverage that:

http://standards.iso.org/ittf/PubliclyAvailableStandards/c060101_ISO_IEC_21320-1_2015.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-439429556, or mute the thread < https://github.com/notifications/unsubscribe-auth/AE1vNSqB2G40sWlM0Ke0zdW7qBr9Ps4kks5uvtomgaJpZM4X2INH

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/w3c/wpub/issues/352#issuecomment-439700768>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOQD0vvmZfPsc8DBrN2E95ibsp3v-cWkks5uwXszgaJpZM4X2INH .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/352#issuecomment-439701600, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNQvXOZ88Os3binUAQ-Q7RBsGXfAWks5uwX4KgaJpZM4X2INH .

iherman commented 6 years ago

@danielweck I have added the .htaccess file to the audiobook example directory:

<Files ~ "\.audiobook$">
Header set Access-Control-Allow-Origin "*"
</Files>

But I am not sure about the Accept-ranges thing. My understanding of the relevant http section is that it expresses a specific capability of the server, but how do I know whether the server running at W3C has it? Or is it a default behaviour for all Apache servers?

danielweck commented 6 years ago

Thanks Ivan.

If I understand correctly, the primary / expected use-case for packaged (i.e. zipped) audio books is for a "reading system app" to fetch the HTTP URL (i.e. download the entire payload), and to store the file locally in some app-managed space (at which point the publication can be unzipped on a filesystem, or accessed directly in its deflated form). Unless the intention is also to allow this scenario "on the web" / in vanilla web browsers (for example: an offliner Service Worker caches the entire ; potentially-large ; *.audiopub asset, or a Javascript program unzips publication resources on-the-fly directly from the URL that references the packaged / zipped audio book) ... then the "CORS" and "range" HTTP headers are not necessary.

However, if the intention is to serve "exploded" audio book web publications from the https://www.w3.org/2018/audiobook_examples/ URL, then both "CORS" and "range" HTTP headers are required.

Let's check:

curl -I -X GET https://www.w3.org/2018/audiobook_examples/flatland.audiopub ==>

HTTP/2 200 
date: Mon, 19 Nov 2018 09:08:37 GMT
last-modified: Tue, 06 Nov 2018 10:55:18 GMT
etag: "e84906f-579fcd6527180"
accept-language: bytes
content-length: 243568751
cache-control: max-age=21600
expires: Mon, 19 Nov 2018 15:08:37 GMT
strict-transport-security: max-age=15552000; includeSubdomains; preload
content-security-policy: upgrade-insecure-requests

...no sign of Access-Control-Allow-Origin, and accept-language: bytes still doesn't make sense to me :) Note that this seems to be a HTTP2 server.

Conversely, see this other media.w3.org video URL which seems to respond from an HTTP1.1 server (also note the appropriate Accept-Ranges: bytes header):

curl -I -X GET https://media.w3.org/2010/05/sintel/trailer.mp4 ==>

HTTP/1.1 200 OK
Date: Mon, 19 Nov 2018 09:29:20 GMT
Server: Apache/2.4.25 (Debian)
Last-Modified: Thu, 13 May 2010 17:49:03 GMT
ETag: "42b795-4867d5fcac1c0"
Accept-Ranges: bytes
Content-Length: 4372373
Cache-Control: max-age=21600
Expires: Mon, 19 Nov 2018 15:29:20 GMT
P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Content-Type: video/mp4

iherman commented 6 years ago

@danielweck the first curl result was my mistake; the extension I used in the .htaccess was wrong. It should be o.k. now for the audibook should be o.k. now with respect to CORS.

As I said, I have no idea what this accept-language: bytes is; I suspect it is a central apache setup problem. I have not touched that one.

That being said, I believe that the current directory was setup for packaged audio books examples only, at least for now. So I would say let us leave it for now as is, and we can come back to this if we get to other types of examples.

HadrienGardeur commented 6 years ago

If I understand correctly, the primary / expected use-case for packaged (i.e. zipped) audio books is for a "reading system app" to fetch the HTTP URL [...]

@danielweck I don't think that's necessarily the "primary" use case. It seems that for some members of this WG (including @GarthConboy), the primary use case is to standardize an ingestion format rather than an end-user format.

IMO a packaged audiobook should handle both use cases.

Unless the intention is also to allow this scenario "on the web" / in vanilla web browsers [...]

That's not a use case, WP serves that purpose, not the packaged version.

However, if the intention is to serve "exploded" audio book web publications from the https://www.w3.org/2018/audiobook_examples/ URL, then both "CORS" and "range" HTTP headers are required.

While the manifest itself would require specific headers for CORS, that's not the case for the audio resources as long as you use <audio>. But support for range request is indeed a must have for audio resources.

danielweck commented 6 years ago

Just a note about "ingestion format" vs. "end-user format": on multiple occasions I heard the term "distribution format" used to describe what I personally interpret as a B2B "interchange format". This notion of "distribution" really depends on "who distributes to whom", it's a question of perspective :) (same with "delivery format")

So, this kind of terminology can easily be misconstrued if we don't define the context carefully, and some of us might get lost in translation during our conversations. There are quite a few intermediaries along the digital supply chain (content creation / authoring, publishers, libraries, accessibility remediation, reading systems, etc.). I'm no expert, but I imagine that audio books production + distribution (that word again!) involve a very different workflow than ; say ; trade e-books, comic books, scholarly publications, etc. (which is why we're discussing TOC and packaging issues, notably)

So, as we aim to clarify use-cases specifically for packaged audio books (e.g. "ingestion" / "interchange") vs. generic packaged web publications (e.g. "delivery" / "distribution"), let's also try to disambiguate the terminology :)

w3c / wpub

Packaging for audiobooks #352