w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
305 stars 60 forks source link

Remove the NCX #633

Closed mattgarrish closed 8 years ago

mattgarrish commented 8 years ago

The NCX is currently superseded (allowed in an EPUB 3 but ignored). The working group is currently considering whether it is time to remove the NCX, which might begin by a stronger deprecation statement.

tofi86 commented 8 years ago

How do you plan to support EPUB 3.1 on older eReading systems then? We're using NCX in EPUB 3 a lot for backward compatibility...

mattgarrish commented 8 years ago

That's part of what's up for discussion: whether there's value in continuing to suggest compatibility through grandfathered features like this. Audio, video, scripting, fixed layouts, css3 support, (potentially) html serialization, etc. make many epub 3 publications low quality, if renderable at all, on epub 2 reading systems. Having an ncx isn't going to make the content work.

Currently, no decision has been made, and may not before the first draft is published in January. A survey was sent out to gauge use, but it closed after the last working group meeting before the holidays. If not resolved in early January, an issue will be added to the specification with a link to the tracker for comments.

More information will be made available as it becomes available in the "deprecatables" doc at: https://docs.google.com/document/d/1fHRT3jTXAHQc2Rtz5tfIef8u90CWtROZca1a2_XQGq4/edit#heading=h.o0kezrval575

mattgarrish commented 8 years ago

The working group came to the consensus to remove the NCX from EPUB 3.1 on the 2016-01-08 conference call: https://docs.google.com/document/d/1Ari3a4urma6js0L-GLkzwDEqsI2dKFPs7X1J9LXqNEk/edit#heading=h.zc0soncmn95

This issue will remain open past publication of the first draft for comment.

tofi86 commented 8 years ago

The working group came to the consensus to remove the NCX from EPUB 3.1 ...

IMHO, this is a bad decision. Even though there seems to be slight chance to keep the NCX structure for backwards compatibility...

MG: We want to be aggressive in january draft. We can find out of epub2 compat is a big issue when the january draft is reviewed.

... I want to note once again, that breaking EPUB 2 compatibility (e.g. removing NCX and other required structures) will have fatal consequences on the adoption rate of EPUB 3.1.

From my experience with some of the biggest german publishers, even EPUB 3 is a hard decision. While it may be well supported in the meantime by some minor apps (on tablets and smartphones) or web based reading systems, nearly none of the existing eInk hardware can render native EPUB3 without an EPUB2 fallback (NCX and such).

Dropping EPUB 2 backward compatibility means dropping support for millions of eInk reading systems. You may argue that it's a matter of the reading system vendors to provide firmware upgrades, but let's be honest – estimated 80% of the eInk hardware will never get a firmware upgrade by the vendors (mostly Adobe's fault due to their RMSDK policy, but that's another sad story...) and customers with such a reading system can put it to the trash.

If publishers get to hear that, I fear they will never decide in favor of EPUB 3.1 for a long time but stick to EPUB 3.0 or even EPUB 2 for longer because they can offer better/broader reading system support.

LA: In production, suppliers had to create NCX for EPUB2. In EPUB3, they didn’t drop the old code, just added NAV. Production is automated. When we do corrections that’s when it’s a mess.

Luc Audrain is right with that. But I'd rather do and vote for corrections to the existing code than completely dropping backwards compatibility just to "clean the specs" (which is my personal impression of EPUB 3.1).

iherman commented 8 years ago

(Caveat: I was not on the call last Friday, so I may be repeating things that were said there.)

I think that the issue is not what is in the standard but more what conformance would require and what validators can do. The fact that EPUB3.1 does not include a reference to NCX is one thing, another issue is whether it is forbidden (and rejected by the validator) to include in the package various types of files that address the needs of particular RS-s. That can be and NCX file, or any other type of files that a particular RS may rely on for some specific purpose.

We have some analogy with browsers. I presume (have not checked) that browsers carried on implementing the element for a long time although the element was removed from the official HTML standard. As the world evolved, and sites moved away of its usage, browsers eventually removed this.

tofi86 commented 8 years ago

I get your point, Ivan. And most of what I've written above isn't only about the NCX beeing removed, to be honest. But I want to make clear that backwards compatibility IS an issue for publishers and that it needs to be in the spec in some way. I don't want to rely on vendors gusto in supporting NCX or not. I would just like to see new EPUB 3.1 files beeing still compatible with (older) eReading hardware. Or at least beeing compatible with current soft- and hardware (which they widely wouldn't be when you take a look at the current eInk hardware).

jstallent commented 8 years ago

While I agree with the idea of removing the NCX, I think the point is well made that this could stymie adoption of 3.1 if the inclusion of an NCX (and the linking to that NCX in the spine element) are marked as errors or warnings in EPUBCheck. While most of the major US retailers have moved on from requiring an NCX not all of them have (looking at you, B&N), and publishers live in the real world of file delivery and functionality.

I'm all for the change as long as it does not make publishers have to choose between meeting the new standard and selling their books. They will choose selling their books every time.

mattgarrish commented 8 years ago

I understand the concern, but if we bring it back I'd hope that we could find another way than forcing it into the 3.1 specification. My thought since removing came up is that perhaps there should be a compatibility extension that allows these features to pass epubcheck. For example, add a dc:type value like "epub2compat" and then you get ncx, opf meta 2 and whatever else you want for epub 2 support. It would let the specification move forward for publishers and tool and rs developers who don't want to support epub 2 features, but provide the option for backwards compatibility on the authoring side.

The odd thing is that this move is being driven more by the publishers in the group.

elmimmo commented 8 years ago

Do take note that Kindlegen’s latest version only recognizes NCX's pageList as a means to declare page numbers and ignores the Nav Doc's page-list altogether irrespective of the presence of the former or not.

Being able to generate one single EPUB that works for all vendors and that can be validated, even when the purpose is converting it to Mobi too, is convenient.

I do acknowledge it is not EPUB's role to particularly mind what Amazon's preprocessor requires and that it is Amazon instead who should update Kindlegen to support EPUB 3's syntax for page numbers. I thought this should be mentioned, nevertheless.

dgatwood commented 8 years ago

I absolutely agree with tofi86. This issue and #644 are utterly baffling to me.

I’m assuming that when you say publishers are driving this, you mean textbook publishers—interactive book publishers. For those of us doing fiction, removing backwards compatibility would qualify as absolute brain damage, and that's not just my opinion; it seems to be the general consensus among a pretty good sized pool of independent publishers and content formatters.

As a publisher, I want to create content that will be readable on as wide a range of devices as possible. Web standards are deliberately built in such a way that they gracefully degrade on older devices that don’t support them. For this reason, I can take a modern HTML document and read it in Netscape Mosaic 0.9b3 or lynx 1.0 (early 1990s) as long as it doesn't require JavaScript. It might look like crap, but I can read it.

The reason web standards work the way that they do is to encourage adoption. I know that my content will gracefully degrade on older browsers, which means I can comfortably adopt bleeding edge CSS 3 features NOW. I'm doing this in the content for my EPUB books as well, introducing CSS 3 features like shape-outside on drop caps to produce spectacular typography on the tiny minority of reading platforms that support it, knowing that it won't break other readers, and that eventually users on other reading platforms will gain those capabilities as well.

Without that backwards compatibility, I wouldn’t be adopting those features for another five years, and probably more like ten. So in ten years, I’ll be able to adopt the EPUB 3.1 metadata format, when all of the early EPUB readers (many of which have been abandoned by their manufacturers) have all died from capacitor plague. In twelve years, I’ll be able to adopt EPUB 3.2. And so on. By the time features become usable, they'll be a decade behind the state of the art.

More importantly, Amazon et al won’t have this problem. They’ll be constantly pushing the limits of what the technology can do. So if our EPUB-formatted content has to lag behind by a decade to maintain backwards compatibility with existing readers, our Kindle-formatted content will be light years ahead at any given point in time. This will result in people saying things like “Get the Kindle version. The EPUB sucks.” Eventually, the entire EPUB format will wither on the vine.

If I had my way, these features wouldn't even be deprecated. They would be optional. Features shouldn't be deprecated until there is no longer any good reason for publishers to include them, which is certainly not the case yet by any stretch of the imagination. They should be deprecated in five or ten years, and removed several years after that.

BluefireMicah commented 8 years ago

It is important to be aware of the ecosystem realities around any kind of impediment to the use of NCX. A large portion (the majority I believe) of the current commercial and library ebook ecosysems worldwide (outside of the mega retail platforms such as Amazon and Apple) rely on RMSDK for rendering. While Adobe has done some work to include a modified version of Readium SDK in RMSDK, there are a variety of technical architectural and business impediments to the adoption of this new version of RMSDK in general, and particularly in terms of including EPUB3 support in such apps. In fact, I'm not aware of any retail or library platform apps that do so in the market today (including the 60 or so Bluefire powered apps out in the market today that do not). And, even if an org were successful in deploying new RMSDK apps that integrated EPUB3 support (if that were possible..) there will still be legacy users on older devices running older operating systems that could not upgrade to the new app version due to the mobile platform company's aggressive deprecation of compilation tools for older OS versions. And, there are many eink devices actively used by consumers that can not, or at the very least will not, be updated for a variety of reasons (don't have the processing power to run modern browser engines). Currently, many of the largest publishers have, or are moving to, releasing all of their new titles in EPUB3 format. Quite often these titles are front-list, best seller, consumer trade titles that don't really take advantage of the unique features introduced in EPUB3 (e.g. video, media overlays, fxl, etc). In fact that vast majority of titles do not. Currently that is feasible for the ecosystem as these publishes include an NCX for backwards compatibility to reading systems built on RMSDK and other EPUB2 engines. It is extremely important for the broader ecosystem to continue to be able to distribute these front list titles to these "legacy" devices and apps (though legacy is not exactly a spot-on term for brand new apps being released by our customers and others on a regular basis). So, deprecating NCX is one thing (e.g. not requiring it for ecosystems that don't need it) But anything that "hurts" the existing ecosystem outside of the mega-stores would be extremely counterproductive to the broader adoption of EPUB3.1

BluefireMicah commented 8 years ago

This is a follow up comment on my last one above about the realities of the existing ebook ecosystem, as an attempt to share a slightly fuller picture about the real-world challenges of supporting new EPUB spec versions, from the perspective of an RS dev type, a type which does not necessarily comment a whole lot in this WG. It is not specifically relevant to the NXC conversation, so feel free to skip it if the larger Reading System picture is not of interest to you: With EPUB2, most of the popular reading platforms implemented their own rendering engines for a variety of technical, business, and UX reasons. I'll use RMSDK as an example as I'm most familiar with it. There, the rendering engine was created largely "from scratch" rather than leveraging one of the primary browser engines. Features like the ability to select text, highlight text, navigate to bookmarked locations etc are unique to this engine. e.g. in RMSDK, there is a thing called "RMLocation" for identifying a specific insertion point in the text flow. With EPUB3, and its relationship to HTML5, attempting to extend such proprietary engines to support EPUB3 would be a very, very bad choice. Thus supporting the spec requires a quantum leap in technology, not just incremental changes. Everything has to be done differently, all features need to be redeveloped from scratch. This is of course expensive and difficult, and as an example of that my small bootstrap company has to date spent 3 years and well over 1.5 million dollars trying to create EPUB3 enabled apps. We are getting close, but not there yet. But even harder is migrating existing users of existing public library and retail platforms from EPUB2 to EPUB3 apps. For example, if a user has a library of ebooks with highlights and annotations that leverage RMLocation to identify associated text ranges, you have to translate all of that into CFI's for EPUB3. Beyond duplication and migration of all features to a completely different technology stack, there is the question of user experience. e.g. if a user has been reading EPUB2 files with an app with a highly optimized and refined-over-years engine, and one day they get an update and the reading experience is markedly different, but not necessarily better (or in fact worse), this is not a happy user. It is very near impossible to solve this challenge. Perhaps with enough money and time it might be, but the ecosystems trying to compete with Amazon and others don't have much of either. Combine that with the embedded systems realities of reading devices, and the rapidly evolving mobile app platforms, and you will see that EPUB2 reading system will be with us for many years to come. Just imagine if Microsoft shipped Wiondows 11, then updated Visual Studio such that you can only compile updates to web browsers for Windows 10 and above, and also decided only the recently shipped PC's can be updated to W10. Imagine how many people with two or more years old PC's would be stuck without the ability to update their browser. That is in fact the reality we have on iOS and mobile apps today.

Now, a person in the US might say, well, not that many people are actually using these non-mega reading systems. And in the US you might be right (e.g. only a few million ebook consumers). But in other countries the picture is a bit different. You have consortia like Tolino in Germany (most of the large retailers) with a sizable portion of the German consumer market. Or you have the very popular library systems of most of the Northern European counties. The leading regional retailers in much of Europe in general. I'm not trying to be a downer here, and I'm a very enthusiastic fan of EPUB3 (e.g. we are a major contributor to the Readium projects). My intent here is just to make clear just how important backwards compatibility is going to be for years to come. And why. Unfortunately.

BluefireMicah commented 8 years ago

I do recognize that there could still well be a strong rationale that the right choice is to "break" with the past for EPUB3.1 If so, it seems to me that this would require a very dedicated and sophisticated global communications program to make sure that all of the ecosystem players clearly understand the break, and the ramifications of choosing to deploy content in 3.1. To me the important thing for IDPF would be going into that with eyes wide open about the ramifications, and the appropriate dedicated resources to conduct such a global communications program. My sense is that the ecosystem realities are not well understood today in the industry at large.

BluefireMicah commented 8 years ago

Final RS related comment of the day. This "having to rebuild everything from scratch vs incremental evolution" to fully embrace EPUB3 is not just an issue for the "alt" ecosystem. I see it playing out in the Kindle and Nook platforms as well. Many of the oddities of these platforms, and the uneven rendering and feature support across devices/platforms clearly (to me) relates to this. I only mention this as I think sometimes maybe the oddities of these platforms seems random and nonsensical to folks on the content production side of things.

dgatwood commented 8 years ago

To be fair, IIRC, Nook is based on RMSDK, so they're running into problems for the same reason that you are.

As far as I can tell, most of these uneven rendering problems have less to do with uneven feature support and more to do with abusive stylesheets that go overboard with the universal selector, and/or abusive renderers that override the stylesheet in highly nonstandard ways.

IMO, one thing that the EPUB standard really needs are standards about how to construct CSS that is guaranteed to be strong or weak with respect to the client's rules, i.e.

along with requirements that all clients conform to those standards when overriding book styles.

The EPUB standard also ought to be more forceful in future versions about readers complying with the parts of the CSS and HTML specs for gracefully handling content that they don't understand. If something is structurally valid, even if a parser doesn't understand a particular property or @ rule, the parser should ignore that bit, not blow up and ignore the entire stylesheet or the rest of the stylesheet from there down.

These forwards compatibility problems are far more problematic from a compatibility perspective than minor differences like XHTML versus HTML5, IMO. As long as readers follow the parsing rules correctly, an HTML5 engine should be able to handle HTML6, HTML7, HTML8, etc. whenever those standards come into existence. If they don't, you end up in a situation where publishers can't move forward because of broken support in older readers.

To that end, I'd like to see the spec specify a minimum set of supported HTML/XHTML versions rather than specifying a single version. There's no practical reason when an EPUB 3.x file shouldn't be allowed to contain XHTML for backwards compatibility, and in the opposite direction, there's no reason it shouldn't be allowed to contain HTML6 whenever that appears, because if readers actually follow the rules properly for handling unknown content, future content should gracefully degrade. Any engine that anybody uses for rendering content, with the possible exception of RMSDK, is likely to be able to parse all of those HTML-ish formats without trouble, including XHTML, and any EPUB 3 reader is likely to provide support for EPUB 2, so it isn't as though removing those options buys you much on the software development side.

elmimmo commented 8 years ago

I am intrigued by what going beyond not minding its presence or even validity brings in. What strikes me as radical is the stance of forbidding instead of just ignoring, which in terms of reducing bloat and backwards compatibility burden I consider pretty similar: unless I am missing something, a spec that only tolerated its presence but did not bother even regulating its validity, just like with any other unreferenced Foreign Resource, (as in "I don't know what you are but I don't care") only needs to mention

The spine element can have a toc attribute, for legacy purposes, that does nothing.

which is sort of what EPUB 3 did and which I wouldn't categorize as bloat. The spec needs not to care about how EPUB 2 mandates the NCX and the spine attribute to be in order to only tolerate it but not minding its presence (epubcheck should, though, since those nevertheless adding the NCX to an EPUB 3.1 book would want to know that it worked in an EPUB 2-only RS).

elmimmo commented 8 years ago

@dgatwood wrote:

one thing that the EPUB standard really needs are standards about how to construct CSS that is guaranteed to be strong or weak with respect to the client's rules.

If providing a means for the user to be able to alter margins, font families, size, line height, background colors, etc. is deemed imperative for modern digital text reading environment, then that applies as much to EPUB as it does to the web, and therefore is not something that EPUB should be specifying on its own.

This is admittedly off-topic, though. If someone knows the appropriate forum to bring this conversation to, I would appreciate a reference.

iherman commented 8 years ago

@elmimmo wrote:

@dgatwood https://github.com/dgatwood wrote:

one thing that the EPUB standard really needs are standards about how to construct CSS that is guaranteed to be strong or weak with respect to the client's rules.

If providing a means for the user to be able to alter margins, font families, size, line height, background colors, etc. is deemed imperative for modern digital text reading environment, then that applies as much to EPUB as it does to the web, and therefore is not something that EPUB should be specifying on its own.

I've sympathy for what you say; in fact, the issue of what we refer to as 'personalization' is on the agenda of discussions in the W3C Digital Publishing IG. However, pragmatically, the very general solution to this will take some time, and if the EPUB community comes up with a workable scheme, this may in fact lead to influencing the Web at large (rather than the Web at large influencing EPUB)

elmimmo commented 8 years ago

@iherman wrote:

if the EPUB community comes up with a workable scheme, this may in fact lead to influencing the Web at large (rather than the Web at large influencing EPUB)

EPUB has since long sinned from believing that such was its role (or capability, even). I read the announcement of the first draft of EPUB 3.1 as a realization, in part at least, that attempting to unilaterally influence/dictate the direction of future web standards was the wrong path to take, for being both inappropriate and futile:

The primary thrust of this revision is to bring EPUB 3 more in line with the Open Web Platform (OWP).

elmimmo commented 8 years ago

@mattgarrish wrote:

perhaps there should be a compatibility extension that allows these features to pass epubcheck. For example, add a dc:type value like "epub2compat" and then you get ncx, opf meta 2 and whatever else you want for epub 2 support.

In a scenario where EPUB 3.1 allows the presence of the NCX without regulating its syntax, as it does with respect to any other mere unreferenced Foreign Resource, for the purpose of allowing, but not requiring, backwards compatibility, the decision of whether such compatibility is required or not in specific scenarios is dependent on the platform which the book is intended for, not the book itself. IOW, IMHO there is no point in a book requiring epubcheck to always perform an EPUB 2 retro compatibility check on itself and error out if it fails it, when the platform it might be being validated for does not need the NCX in the first place.

If the spec finally allows, but does not require, the NCX for backwards compatible purposes, checking against such compatibility or not should come from an optional argument of epubcheck, not a internal property of the book itself. Authors/Distributors/Retailers then could check if the book is backwards compatible if they care, or not if they don't, when validating it against EPUB 3.1.

This rationale has the benefit of freeing the spec from having to ponder about, and therefore giving space to, what a retro-compatible ebook is like and regulate those tags you proposed.

BluefireMicah commented 8 years ago

@dgatwood wrote: "To be fair, IIRC, Nook is based on RMSDK, so they're running into problems for the same reason that you are.

As far as I can tell, most of these uneven rendering problems have less to do with uneven feature support and more to do with abusive stylesheets that go overboard with the universal selector, and/or abusive renderers that override the stylesheet in highly nonstandard ways."

True about Nook, but then I've also seen this proprietary approach to ebook content rendering in most of the "early" crop of ebook reading system platforms across the board. And it remains a core challenge in migration to EPUB3 for RS devs IMO with many side effects.

I too have always felt that the collision of personalization and RS imposed formatting with authored markup is one of the largest challenges faced by the industry - and somewhat of a gaping hole in the spec. Granted off topic to this NCX thread, so I will resist diving deep into that here,

BluefireMicah commented 8 years ago

Back on topic: My sense is that if someone were to ask a broad set of RS devs if NCX is "required" for EPUB3, they would of course say no, as it is not in fact "required" - in dev speak. If you instead asked them if NCX is very important in EPUB3 for backward compatibility in EPUB2-ish rendering engines, the vast majority would, I'm sure, say "absolutely"

BluefireMicah commented 8 years ago

One ecosystem factor worth being aware of in relation to version compatibility and validation, is that while the largest retail platforms do have ingest systems with automated validation processes and essentially spec sub/super sets, there is as significant portion of the market that redistributes content sourced through large aggregators and via literally hundreds of smaller distribution channels. The "end" distributor (e.g. regional retailers, library systems, etc) often have no insight into such details across their catalog, and are not able to validate files that very often are not available to the end-point distributor in source form - meaning they don't have the unencrypted book bytes. e.g. they can not "see" into the files. This could be considered an ecosystem problem rather than spec problem, but there are times when pragmatic realities reasonably could be considered when crafting specs.

bitsgalore commented 8 years ago

Apologies if somewhat off-topic (but I couldn't think of a better place to post this), but I just wrote this blog post with some thoughts on the EPUB 3.1 draft, focusing mainly on backward-compatibility:

http://blog.kbresearch.nl/2016/03/10/the-future-of-epub-a-first-look-at-the-epub-3-1-editors-draft/

Most of this is addressed in the various threads in this repo (also quote from some responses), but might be useful as a summary ....

JayPanoz commented 8 years ago

Hi and sorry for being late to the party.

I know ncx is a pain but well, I’d like to provide some personal feedback. A lot of things have been already covered but if my input can help, let’s write it down.

From what I can understand, the question in the survey was:

Do you require the NCX in EPUB 3 publications? (source)

This may be where things can cause confusion since, in practice, there is still an awful lot of devices/apps shipping with ePub2 support only. And in my market, some distributors may even advise you to stay away from EPUB 3, excepted for advanced features like interactions or fixed-layout, because of this hard truth.

I know this is sad but that’s my daily life: I must sometimes fight with distributors or adapt my EPUB 3 files so that they can aggregate them properly.

To sum up, if EPUB 3.1 removed NCX entirely, some wouldn’t even bother consider supporting it.

mattgarrish commented 8 years ago

It's been returned for the next draft, but as an obsolete feature.

We were looking to publish a new draft before the Bordeaux meetings this week, but there are continuing discussions on a number of issues that we'd like to publish all in one draft for review, and some parts that are likely to be split off as they need more time to develop (html and the web-friendly format).

I can't give a definite timeline for the next draft, but I'd guess no later that just after the Chicago meetings being held during BEA.

tofi86 commented 8 years ago

It's been returned for the next draft, but as an obsolete feature.

That's good news :thumbsup: Hopefully the render engine vendors will prepare for deprecation with the next iteration of the standard...

rkwright commented 8 years ago

Sorry, that's not how it works. We (the reading system developers) have to maintain support for all the old versions of the spec as long as there are any significant numbers of books in those formats. Since there are hundreds of thousands if not millions of EPUB2 books in circulation, we have to support NCX. It's not a terrible burden though as we simply map NCX onto the TOC support for EPUB3. But note that this is just the tip of the iceberg. There are many many features we have added or modified going from EPUB 2 to 3 to 3.1. The RS' have to maintain (and test and retest) them all. As long as the features are ADDED it's not so bad. It's when the spec MODIFIES the behaviour of an existing feature that it become a mess because then we have no choice but to either not support the modified form of the feature, or break support for the old form.