w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

Web publications / Publication manifest / other forks difference and status confusion #465

Closed mrjj closed 4 years ago

mrjj commented 4 years ago

Hello!

I Spent some time to realize that there are 2 active in parallel web/digital publication manifests maintained by the same group

Web Publications https://github.com/w3c/wpub/ https://w3c.github.io/wpub/

Publication Manifest https://github.com/w3c/pub-manifest https://w3c.github.io/pub-manifest/

That is fork of https://github.com/readium/webpub-manifest That is also active in parallel, despite Readium joined W3C activity

And in from Readium ecosystem it was internally and then toward OPDS2 group was forked to OPDS2 manifest (actually incompatible) to https://drafts.opds.io/opds-2.0 and its referring back to Readium schema as source of thruth, but also moving forward

And all this possible spin-off of previous DPub activity https://github.com/w3c/dpub-pwp-ucr/

That is spin-off of EPUB3 https://www.w3.org/publishing/epub32/ (including legally complex heritage of IDPF joined W3C recently including https://www.w3.org/Submission/epub-ocf/)

And all them are different in the details, how links are located, absence or presence of readingOrder and so on. By essence and excluding client targeted details all this is completely about the same kind of metadata. Same core set of fields (DC+) and so on.

My final question: which of all this forks is stable enough to use it e.g. for the next 3 years? EPUB3 is unavoidable due level of third-parties support. But it seems sub-optimal due heritage mess and legal status.

My final goal is to deliver digital publications authoring system as part of ecosystem for national digital publishing platform. It was OK to align EPUB with Readium publication manifest and OPDS2. W3C Web publications became good ground as central interexchange format till i discovered second parallel fork driven by same group by same committee. Currently i have no idea what may be metadata pillar. And time when it will be cheap for program to make wrong choices will end soon. Help, please.

wareid commented 4 years ago

Hi!

Sorry for all of the confusion. The most stable of the documents is Publication Manifest which is currently on the W3C recommendation track with an expected confirmation by the summer. The current status of the document is stable as we are in implementation phase right now (if you're interested in tests they're available here), and we won't be making any fundamental changes to the document any time soon.

Publication Manifest is very similar to RWPM and ODPS because EDRLab participated in its development alongside the other members of the Publishing Working Group. @llemeurfr @HadrienGardeur perhaps you can address this as well, since I want to clarify this as best as possible.

Please let me know if you have any other questions, we're here to help :).

mattgarrish commented 4 years ago

And, FYI, the status section of web publications addresses the split:

Due to the lack of practical business cases for Web Publications, and the consequent lack of commitment to implement the technology, the Publishing Working Group has chosen to publish this document as a Note and focus on other areas of interest, including developing the manifest format as a separate specification.

If work on web publications were to resume in the future, it would become a profile of the Publication Manifest specification (same as Audiobooks is now).

mrjj commented 4 years ago

Ohh, its complex as it seems, found this explanation just googling: difference between "Publication Manifest" and "Web Publication Manifest" https://github.com/HadrienGardeur/webpub-manifest/wiki/Web-App-Manifest-vs-Web-Publication-Manifest

My concern is following, we have different user-agents, but this is just user agents so, well, core metadata, information about record, cant depends incompatible way on clients, it should be about record itself. User-agent and human/machine accessability directives can be mutually exclusive by design - you are when you are using user-agent A (tablet reader) you are not using user-agent B (desktop browser). But it should not affect something like DCMI core.

Representation-level and medium-level properties (attributes) may be different and conflict on something like "extent description". Its fine. But as i "diffed" difference between manifests are not only around client specific.

For now i will try to provide fields superset support in favour of Publication Manifest. But this will go complex with any new third-party client or validator.

mrjj commented 4 years ago

And, FYI, the status section of web publications addresses the split:

Due to the lack of practical business cases for Web Publications, and the consequent lack of commitment to implement the technology, the Publishing Working Group has chosen to publish this document as a Note and focus on other areas of interest, including developing the manifest format as a separate specification.

If work on web publications were to resume in the future, it would become a profile of the Publication Manifest specification (same as Audiobooks is now).

Thank you, Matt. I've seen this in repo note making diff. But the funny side that a remembered that one of them is a note. But after 5 minutes already forgot which one.

TzviyaSiegman commented 4 years ago

@mrjj perhaps EDRLab's explanation of their use of W3C Publication Manifest will help clarify https://www.edrlab.org/open-standards/web-publications/. @llemeurfr can clarify that Readium Web Pub Manifest is an interchange format. The document you linked to by @HadrienGardeur wrote is from 2016 and quite out of date.

mrjj commented 4 years ago

@mrjj perhaps EDRLab's explanation of their use of W3C Publication Manifest will help clarify https://www.edrlab.org/open-standards/web-publications/. @llemeurfr can clarify that Readium Web Pub Manifest is an interchange format. The document you linked to by @HadrienGardeur wrote is from 2016 and quite out of date.

Yes, they were initiators and i see them relying on W3C and acquired IDPF as well, so do i. Currently (RSL and Russian National electronic library) are working on really massive Readium-based platform of digital publications distribution and licensing. E.g. almost all national public domain planned to be delivered this way. So, its targeted tens of millions of publications and about hundred of millions users. Readium seems to be really well-thought in architecture, digital licensing model and metadata details. And actually its good for us not to deal with EPUB trademark because things like notes in EDPUB documentation is full blocker for us:

All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the International Digital Publishing Forum (IDPF).

EPUB for Education Structural Semantics

So refreshing standard together with renaming, separating from containers and removal trade association legacy agreements regulations and other aspects of W3C work is great news for us. Traditionally we are worldwide standard early adopter (railways, units of measurement, flags, lot of identifiers and so on, as well as keeping all core ISO being mirrored and aligned under GOST tents) and its good to gain opportunity to stay on this line for digital publishing domain.

What related audio-books currently we are considering this form as accessabilty feature and see them as separate representation work only after this.

The reason of this bias is that we are using Yandex SpeechKit for automatic vocalization. Current solution is state of art system that is already widely used and adopted as part of geo-navigation apps and personalized voice of communicational AI platform Alice (AI term is quite fair, she have core logic not based on predefined QA and form-filling and lot of disclaimers about 0 responsibility what she is talking about and going offensive/provoking in any form). And speechkit just winning competition with the most of live readers. On classical texts voice is emotionally deep and very expressive together with ability to control synthesis details like tempo, gender and in theory full frequency spectre (e.g. to resolve hearing disability cases). For the live readers actually its may happen to be better for all just to purchase license on voice samples set.

I expecting that most of voice synthesis market players will soon or later due Wavenet-based tech (Yandex providing full-home stack for synthesis) will achieve comparable results and expecting that i describing some non-local trend so see this detail of our program worth mentioning.

So i tend to see next generation of audio-books being massive relative to current and possible to consider as client-side specific/directives/representation media payload. For the most cases its parallel with linear structure of text body as well as table of content will stay the same. It may add something to content documents markup <img speech:alt="picure-sem-description-1.ogg" />, and some shortcuts and alternate order in details. So i don't see any general concerns for audio-book markup not to be an extension.

What is currently defined as web publications seems for me to be just a common HTML5 page markup extension. And according to current LD initiative and JSONLD standard status there may be even no extension at all, so goal of this standard may happen to be already achieved by neighbour work-groups.

HadrienGardeur commented 4 years ago

Everything started in 2015, when the IDPF published a charter for EPUB 3.1.

As part of this charter, a sub-group was created to explore what "EPUB on the Web" could look like (also known as the very cute EPUB BFF, as in Browser Friendly Format). Among this group, we quickly became with @dauwhe the most active members as we both created full proposals, examples, as well as prototypes of how such User Agents could work (caching a publication using a Service Worker, navigating in the reading order and handling the table of contents being the favorite topics that we tackled) throughout 2016.

At the same time in Q4 2016, some Readium Foundation members started an ad-hoc group to discuss how we could build a next generation SDK for mobile platforms. We quickly decided that we needed to define a better architecture for Readium projects and the work on EPUB BFF was identified as a good starting point for a manifest format that we could leverage internally.

In the end, IDPF decided to drop a lot of forward thinking ideas for EPUB 3.1: HTML as a serialization was rejected and work on Web Publication Manifest (our new name for EPUB BFF) was put to an end.

For a period of roughly 6 months (Q1 and Q2 2017), Readium became the only place where these discussions continued and as we iterated on the Web Publication Manifest, it became the center piece of our architecture and was eventually renamed to Readium Web Publication Manifest.

In June 2017, W3C adopted the Publishing Working Group Charter with a mission focused on Web Publications. Among other documents, the Readium Web Publication Manifest was listed as an input document for this charter. For at least another 6 months, they were a lot of discussions as part of this WG, with no clear direction:

It took a while to agree that having a JSON manifest was a good idea and then it took a few more months to discuss embedded/linked manifest. By the end of Q2 2018, the WG finally had a good idea of what the W3C Publication Manifest would look like.

Unlike the W3C Publishing WG, members within the Readium Foundation never had such disagreements. The decision to move forward with the Readium Web Publication Manifest for the new architecture was never contested and members started building new open source projects based on this manifest. What was initially meant strictly for mobile apps got extended to Web apps and desktop apps as well and the new Readium architecture based on the Readium Web Publication manifest got adopted by other organizations and open source projects as well (see https://github.com/readium/awesome-readium#examples for some good examples).

Eventually, W3C put on hold its work on Web Publication and focus switched to audiobooks.

HadrienGardeur commented 4 years ago

My final question: which of all this forks is stable enough to use it e.g. for the next 3 years? EPUB3 is unavoidable due level of third-parties support. But it seems sub-optimal due heritage mess and legal status.

Of all these works, https://w3c.github.io/audiobooks/ is the only spec designed as an authoring format.

Readium Web Publication Manifest is meant to be an internal format shared by multiple open source projects to connect components together.

If you're looking for an authoring format:

If you're looking for an internal format, well... you're free to use whatever you want. RWPM was designed with EPUB support in mind and this is well documented. It's also fairly stable now since it has been used for the last 3 years to build various projects.

But you can really use anything you want as an internal format and there's no "right" answer to that question.

OPDS is something quite different as the goal is to provide catalogs that can be browsed, searched and from which you can also download/borrow/buy publications. Many libraries in North America and Europe use OPDS for this purpose.

wareid commented 4 years ago

@HadrienGardeur Thank you for the clarification on OPDS.

I am going to close this issue for now as I believe we've clarified the confusion about which spec is which and where the most stable version is. Thank you for raising this @mrjj and please feel free to continue asking questions in the pub-manifest repo or directly to any of the working group chairs/editors if you need anything else!