Proposed changes to 2.2

iherman commented 5 years ago

I try to re-formulate section 2.2 to make it the conformance part more explicit (for my taste). Also, I believe the current formulation is a bit too restrictive: if the manifest file is obtained from the PEP but is stored as a separate file, is there any reason to require a fixed name and position in the file system?

Here is what I would propose as a (slight) reformulation. (Slight, because the main idea remains the same.)

(Note that, in this text, I also propose to keep to index.html as the predefined name. index.html is the natural setup on Web Servers for the entry page in a directory, and probably most of the users/developers would instinctively keep to that pattern. As an LPF might result in a simple zipping of a directory in a file system, it would be more natural to keep to this pattern.)

The Package MUST include at least one of the following files in its Root Directory:

A file named index.html, which MUST follow the requirements of a Primary Entry Page of a Web Publication.

A file named manifest.jsonld, which MUST be in the format defined for Web Publication Manifests.

The contents of both files are specified in [[wpub]].

The User Agent MUST obtain the Web Publication Manifest for the publication included in the Package through the following steps:

If the Package contains an index.html file, the Web Publication Manifest is obtained following the rules described in the relevant section of the Web Publication specification [[wpub]].

Otherwise, the manifest.jsonld file is used as the Web Publication Manifest.

If both index.html and manifest.jsonld are present in the Package, then the former SHOULD contain a reference to the latter, following the rules described by the definition of the PEP.

All other files within the Package MAY be in any location descendant from the Root Directory.

Note that the index.html page may contain an embedded manifest, i.e., the Web Publication Manifest may not be explicitly present in the Package.

iherman commented 5 years ago

This issue was discussed in a meeting.

No actions or resolutions
View the transcript
1.2. proposed changes to 2.2
Laurent Le Meur: https://github.com/w3c/pwpub/issues/38
Laurent Le Meur: The issue opened by Ivan - 38 - This one is an easy one. It’s an extension of the wording in section 2. I made it very light originally (in 2.2) but Ivan would like it to have a bit more meat.
Ivan Herman: Let me explain. It’s mostly terminology. It’s an editorial question. The only difference — which is a technical difference — in the current document, the JSON-LD document has a fixed name. From the WPUB point of view, the entry page can point to any manifest.
… so if someone creates an audiobook with an entry page, they can name the manifest whatever they want. Technically speaking, this is the only thing that is a little bit different. Otherwise, I think it’s mostly editorial to make the text more firm.
Laurent Le Meur: it’s also about the renaming the PEP index.html and I totally agree. Personally, I agree, so I can put it in the draft.
Benjamin Young: I was going to ask for clarification on naming requirements.
Tzviya Siegman: Right now entry.html and mainfest.jsonld are required.
Laurent Le Meur: If you want the user agent to find them easily, then yes, but we can try to relax them as ivan proposed.
Ivan Herman: I think there is a 3rd one Laurent that may be worth discussing. Is it really necessary to have a separate section on relative URIs when there is already a section written in the WPUB document. There’s only one place it should be.
Laurent Le Meur: I’ll review the WP document and see if we can remove the additional one.

llemeurfr commented 5 years ago

The wording proposed by Ivan opens an issue: the rules defined for the PEP contain the WP section on Obtaining a manifest , which rely on the definition of a WP " web origin". In the Package case, finding the Manifest from the PEP is a matter of using a relative URL and considering the the 'Origin' is the Root Directory of the Package. And this use case there is no fetch, no cross-origin, no network. The processing described in Obtaining a manifest is therefore not really adequate.

We can make it straight by wording the first step proposed by Ivan as

If the Package contains an index.html file, the Web Publication Manifest is obtained following the syntax described in the relevant section of the Web Publication specification.

and specifying a file oriented processing rule for obtaining the manifest from the PEP.

llemeurfr commented 5 years ago

I updated the LPF document to include a specific processing for obtaining the Manifest from the PEP. The processing steps in the WP spec are too much tied to Web operations and would be problematic here. The LPF processing steps are a re-wording of the WP processing steps and are detailed in https://raw.githack.com/w3c/pwpub/master/spec/ocf-lite.html#sec-obtaining-manifest.

iherman commented 5 years ago

I am bit wary of copy pasting and then modifying algorithms, it is a source of possible errors.

What is wrong in saying

(conceptually) unpack the content of the package to a temporary place
perform the WPUB algorithm to get hold of the manifest, with the exception that
- if the link in the PEP is not a fragment ID, then the URL MUST be relative (i.e., the manifest MUST be in the expanded package content)
- if there is no PEP, use publication.json

and let the details be handled by the WPUB spec...

On the other hand, I think it would be important to document what should happen if an LPF processor wants to produce a bona fide WPUB using the package content.

llemeurfr commented 5 years ago

@iherman, if you compare https://w3c.github.io/wpub/#wp-obtaining-manifest and https://raw.githack.com/w3c/pwpub/master/spec/ocf-lite.html#sec-obtaining-manifest I'm sure you'll see what I mean. The former is full of references to Web processing that are not applicable to ZIP files.

Using something like "(conceptually) unpack" will rise many questions from some WP members, and could end up as a can of worms: what is the origin of the conceptually unpacked publication, what means fetch, cross-origin, request, response, UTF-8 decoding, even URL in the case of offline processing?

I took the "bull by its horns" be tailoring a processing model for obtaining a Manifest from an embedded PEP. I believe that's cleaner (no interpretation needed).

On the reverse, I don't think it is good to document in the LPF spec what should happen if an LPF processor wants to produce a bona fide WPUB. Would it be normative or not? Detailed or not? Would we do such a think for EPUB 3.2 / OCF (yes, EPUB files can become bona-fide WPs, this is what we're doing in Readium soft)? LPF is a file format, we're writing a file format specification here. WP is a Web format. Developers should link one to the other using a developers guide, not a normative spec IMHO.

iherman commented 5 years ago

Well... I understand what you say, but if you replace "(conceptually) unpack" to something like "(conceptually) unpack to a local server, served by, e.g., localhost" than all those Web references, though may be overcomplicated of that case, fall into their place. (And I know that is exactly what you do in Readium, isn't it?)

This approach also raises questions, which is the lingering issue on what the relationship between LPF and WPUB. Any step that is duplicated/modified that way gives the impression of a greater and greater distance. Hence my issue.

LPF is a file format, we're writing a file format specification here. WP is a Web format.

True. But it is a file format whose content can be turned into a WP. If it is not described (b.t.w., for the time being, informatively) in this document, where else? We have seen that there is a confusion already, and adding some text about the exact relationship would go a long way to dissipate those confusions...

Anyway. I will not lie down the road over this, and let the WG express its opinion.

w3c / pwpub

Proposed changes to 2.2 #38