w3c / pwpub

W3C packaged Web Publications
https://w3c.github.io/pwpub/
Other
15 stars 9 forks source link

Proposed changes to 2.2 #38

Open iherman opened 5 years ago

iherman commented 5 years ago

I try to re-formulate section 2.2 to make it the conformance part more explicit (for my taste). Also, I believe the current formulation is a bit too restrictive: if the manifest file is obtained from the PEP but is stored as a separate file, is there any reason to require a fixed name and position in the file system?

Here is what I would propose as a (slight) reformulation. (Slight, because the main idea remains the same.)

(Note that, in this text, I also propose to keep to index.html as the predefined name. index.html is the natural setup on Web Servers for the entry page in a directory, and probably most of the users/developers would instinctively keep to that pattern. As an LPF might result in a simple zipping of a directory in a file system, it would be more natural to keep to this pattern.)

The Package MUST include at least one of the following files in its Root Directory:

The contents of both files are specified in [[wpub]].

The User Agent MUST obtain the Web Publication Manifest for the publication included in the Package through the following steps:

  1. If the Package contains an index.html file, the Web Publication Manifest is obtained following the rules described in the relevant section of the Web Publication specification [[wpub]].
  2. Otherwise, the manifest.jsonld file is used as the Web Publication Manifest.

If both index.html and manifest.jsonld are present in the Package, then the former SHOULD contain a reference to the latter, following the rules described by the definition of the PEP.

All other files within the Package MAY be in any location descendant from the Root Directory.

Note that the index.html page may contain an embedded manifest, i.e., the Web Publication Manifest may not be explicitly present in the Package.

iherman commented 5 years ago

This issue was discussed in a meeting.

llemeurfr commented 5 years ago

The wording proposed by Ivan opens an issue: the rules defined for the PEP contain the WP section on Obtaining a manifest , which rely on the definition of a WP " web origin". In the Package case, finding the Manifest from the PEP is a matter of using a relative URL and considering the the 'Origin' is the Root Directory of the Package. And this use case there is no fetch, no cross-origin, no network. The processing described in Obtaining a manifest is therefore not really adequate.

We can make it straight by wording the first step proposed by Ivan as

and specifying a file oriented processing rule for obtaining the manifest from the PEP.

llemeurfr commented 5 years ago

I updated the LPF document to include a specific processing for obtaining the Manifest from the PEP. The processing steps in the WP spec are too much tied to Web operations and would be problematic here. The LPF processing steps are a re-wording of the WP processing steps and are detailed in https://raw.githack.com/w3c/pwpub/master/spec/ocf-lite.html#sec-obtaining-manifest.

iherman commented 5 years ago

I am bit wary of copy pasting and then modifying algorithms, it is a source of possible errors.

What is wrong in saying

and let the details be handled by the WPUB spec...

On the other hand, I think it would be important to document what should happen if an LPF processor wants to produce a bona fide WPUB using the package content.

llemeurfr commented 5 years ago

@iherman, if you compare https://w3c.github.io/wpub/#wp-obtaining-manifest and https://raw.githack.com/w3c/pwpub/master/spec/ocf-lite.html#sec-obtaining-manifest I'm sure you'll see what I mean. The former is full of references to Web processing that are not applicable to ZIP files.

Using something like "(conceptually) unpack" will rise many questions from some WP members, and could end up as a can of worms: what is the origin of the conceptually unpacked publication, what means fetch, cross-origin, request, response, UTF-8 decoding, even URL in the case of offline processing?

I took the "bull by its horns" be tailoring a processing model for obtaining a Manifest from an embedded PEP. I believe that's cleaner (no interpretation needed).

On the reverse, I don't think it is good to document in the LPF spec what should happen if an LPF processor wants to produce a bona fide WPUB. Would it be normative or not? Detailed or not? Would we do such a think for EPUB 3.2 / OCF (yes, EPUB files can become bona-fide WPs, this is what we're doing in Readium soft)? LPF is a file format, we're writing a file format specification here. WP is a Web format. Developers should link one to the other using a developers guide, not a normative spec IMHO.

iherman commented 5 years ago

Well... I understand what you say, but if you replace "(conceptually) unpack" to something like "(conceptually) unpack to a local server, served by, e.g., localhost" than all those Web references, though may be overcomplicated of that case, fall into their place. (And I know that is exactly what you do in Readium, isn't it?)

This approach also raises questions, which is the lingering issue on what the relationship between LPF and WPUB. Any step that is duplicated/modified that way gives the impression of a greater and greater distance. Hence my issue.

LPF is a file format, we're writing a file format specification here. WP is a Web format.

True. But it is a file format whose content can be turned into a WP. If it is not described (b.t.w., for the time being, informatively) in this document, where else? We have seen that there is a confusion already, and adding some text about the exact relationship would go a long way to dissipate those confusions...

Anyway. I will not lie down the road over this, and let the WG express its opinion.