Closed dauwhe closed 2 years ago
Made a third test with media-type="application/xml"
. Thorium displayed the XML tree.
An experience I wanted to share. I prepared my EPUB3 content with "cryptocurrency-blockchain" (JSON). The book is read when logged in with the "reading key" prepared for the reader. ReadiumDesktop and Thorium viewed it successfully. And in Thorium the "bookmarks" feature worked fine.
Reading system developers: does your RS use manifest fallbacks? Do you check mime types to decide when to use the fallback?
No.
/hides in shame
The issue was discussed in a meeting on 2021-05-27
The issue was discussed in a meeting on 2021-05-28
To pick up from the F2F, the specification change here is pretty minimal - we deprecate manifest fallbacks. Done.
In terms of impact, however, I see three areas of concern:
img
tag will get warnings. There's a strong case here that this is plain old bad practice given that picture
now exists for this purpose. But it means legacy content would have to be updated. Do we know if anyone is using manifest fallbacks for this purpose, or is this a piece of epub 2 legacy that can happily be done away with?target
attribute and not require core media type fallbacks so long as _blank
is specified. (But whether it's viable to do at this point I can't say.)That's as far as I've gotten trying to wrap my head around the impacts of a change. Doable, but it will entail a measure of pain for some publishers.
And I suppose one other (at least theoretical) use for manifest fallbacks was to provide options among content documents. I specifically remember this being introduced as a kind of "no script" option, so if a reading system that didn't support scripting found a content document marked as scripted it could look in the fallback chain for a non-scripted alternative. Similarly, it could look for a document without mathml markup, etc. if it didn't support the technology.
I haven't heard of manifest properties being used in this way (pretty cumbersome and duplicative), or of reading systems supporting them for fallback lookup, so I'd be surprised if deprecating would have any effect here.
To pick up from the F2F, the specification change here is pretty minimal - we deprecate manifest fallbacks. Done.
:-)
[...] That's as far as I've gotten trying to wrap my head around the impacts of a change. Doable, but it will entail a measure of pain for some publishers.
At the moment, we have ways of addressing unsupported features: deprecation and legacy. The former means the generation of warning, the latter does not. Labelling fallbacks as 'legacy' is not really a way to go because those refer to EPUB 2.* features.
What about introducing a third category, say, "discouraged"? Discouraged features are very close to deprecated ones, except for the last sentence in A.1:
Validation tools SHOULD alert EPUB Creators to the presence of deprecated features when encountered in EPUB Publications.
Instead, it could say something like
Validation tools MAY alert EPUB Creators to the presence of deprecated features when encountered in EPUB Publications.
We can then discuss this with the epubcheck people: by default no warning are issued for discouraged features, but there may be a separate flag that does result in warnings.
Would that be a way to go?
3. anything that opens by hyperlink has to be in the spine. If you want to open an image out of its html by clicking on it, for example, the only way to make that valid is by putting the image into the spine and having a manifest fallback.
This is what I'm finding problematic. The example in #1911 has a link to a JPEG image. Currently we require the content author to create an SVG version of that image that will never be used, even if the reading system supported manifest fallbacks. Requiring the fallback here serves no purpose.
At the very least, I don't think this makes much sense for core media types that are replaced elements.
Allowing any core media type in the spine without fallbacks opens the door further to image-only EPUBs.
Quasi-restrictions like having to be replaced content are easy to manipulate. You could list all your img
tags in a file you never use, or put them all in a document but rendered hidden.
That's already a failing of fallbacks, of course, as you can make every non-content document fall back to the same HTML page that says "sorry, you're out of luck" to satisfy the requirement. (Or just use fallback HTML/SVG wrappers without alt tags or descriptions.)
Plus it's not like fallbacks satisfy WCAG, either. If you use them and do make accessible alternatives, you can't claim your content is accessible because there's no way for users to choose whether they want the image or its alternative. And that assumes there is even widespread support for fallbacks, which there isn't, so that also makes their use an immediate failure.
Fallbacks are kind of useless however you look at them, so I have no issue with them disappearing. The question seems to be how far we want to go in the direction of allowing anything in the spine. We can keep pushing the door open a little bit at a time, or we can go all-in and allow any CMTs.
For accessibility, we ultimately have to rely on education, legislation and other means to push publishers to produce accessibly, anyway. Ebooks are mature enough now that novels published as a series of image-only pages isn't terribly realistic anymore. Publishers seem generally aware of the problems of image-based content for accessibility, reading on mobile, etc.
Allowing any core media type in the spine without fallbacks opens the door further to image-only EPUBs.
What if we loosen the requirement that hyperlinked resources must be in the spine?
For instance, we could say:
That would keep the requirement that Content Documents are in the spine if they're reachable from top-level documents, and yet allow things like <a href="my-image.jpg">
or <a href="audio.mp3"/>
while not allowing images in spine.
Would that be reasonable?
What if we loosen the requirement that hyperlinked resources must be in the spine?
But that leads back to the problem of reading systems not being able to locate where the user is in the spine - what comes next, what do you go back to, how do you handle bookmark or annotation attempts, etc.
- hyperlinked local resources must be CMT
In a scholarly book it should be reasonable to have a hyperlink to a data file, javascript code, or a chemical formula described in an XML file (with these contents being part of the EPUB). What should we do about those? I think that the fallback mechanism is o.k. for them, although I am not sure it helps anything to have these resources also appear in the spine (albeit with linear=no
)
FWIW, Play Books does look at the fallback chain and will ignore images directly in the spine unless they are are SVG. We can of course loosen that, but then we have to start deciding how to style all these random images in the spine. Why are we considering this change in this version of the spec? Are we concerned that there just don't exist two implementations that actually support fallbacks?
How serious are we about deferring to the HTML spec to describe how reading systems handle various types of content? The spec covers:
So it makes sense that XML files, text files, or PDFs don't trigger fallbacks because a rendering engine knows what to do with them. But that's not true for DMGs, .exe files, etc.
@dauwhe I think the first two items and the third are different.
I think it would be possible to spec this better, but I also think that expanding on EPUB and RS in this direction would make a lot of sense.
(Thanks for pointing out that the HTML spec has sections for these; this makes our job much easier!)
Interesting: https://github.com/w3c/epubcheck/issues/1298
The issue was discussed in a meeting on 2022-03-11
I try to see some specific ways forward from the current situation, also based on our discussion on the call.
In line with reality (following the "paving the cow-path" approach of some specs) would it make sense to reduce the obligation of fallback chains from MUST to SHOULD? The first glance of the necessary spec changes this would mean:
We could be more stringent and leave (2) as a MUST, but I am not sure whether that will fly in terms of implementations.
As it has been said, the HTML spec has actually a definite way to display text files and/or XML files. Our current approach of requiring fallbacks for those means, therefore, a further restriction on HTML. To alleviate this we could:
Note that, interestingly, json file do not fall under any of these, nor does HTML seem to say anything about them, because the media type for json is application/json
...
We may want to spawn to separate issues here, and closing the current one. I think that @bduga has given an answer to the original question of the issue regarding the tests...
Another question to ponder: what does it mean to fall back to a supported media type?
Consider you embed some shiny new image format into an HTML document in an img
tag and also want it to open as a content document so you add a hyperlink to it.
You can satisfy both cases using manifest fallbacks, but reading systems apparently are supposed to follow the order created by the author in the absence of a properties attribute with more information (and there are no properties for this situation).
So say you satisfy the spine requirement first and in the manifest make the image fall back to an XHTML content document. Then to satisfy the img
element you have to make the XHTML content document fall back to a JPEG, like this:
Foreign image -- falls back to --> xhtml -- falls back to --> jpeg
Does this mean a reading system is supposed to use the xhtml in the img
tag because that's the author's preferred order of fallbacks?
Are reading systems supposed to be aware of what formats make sense in what elements?
Are authors supposed to be aware that they have to craft the fallback chain to compensate for dumb reading system replacements and fall back to the image before the xhtml?
This obviously isn't a big problem in reality, but just another case of how manifest fallbacks are a flaky idea for HTML.
My understanding is that manifest fallbacks were introduced to work around the lack of fallbacks for img
, but now that it has a srcset
attribute and there is also the picture
element, can we consider, if not outright deprecating the use, trying to put a stronger caution in against manifest fallbacks as a replacement mechanism within content documents?
Maybe even label that use a "legacy" feature only meant to support compatibility with EPUB 2?
Maybe even label that use a "legacy" feature only meant to support compatibility with EPUB 2?
I would definitely not have sleepless nights if we did that...
@mattgarrish do you want to propose a PR for https://github.com/w3c/epub-specs/issues/1464#issuecomment-1080714266 or should we leave it as for now (the CR might force us to label fallbacks as under-implemented anyway...)?
Regardless, I think this issue may be now closed.
Cc @dauwhe
I wonder if it might help to split the uses and talk about them separately. Right now, you get a mix of requirements that apply when the fallback chain is for the spine and requirements that apply for elements lacking intrinsic fallbacks. It might give some room to caution about deeper problems like I mentioned in that comment, too.
I don't think it would hurt to caution against the HTML use, too, but we unfortunately can't actually call it legacy when I think about it more. That would mean epub 3 reading systems would have to stop supporting it and it would mean any content that has relied on it would become invalid (if it's a legacy feature, then it wouldn't count as providing a CMT anymore as far as EPUB 3 is concerned).
But closing this issue as it's not directly related.
One related question ... if manifest fallbacks are discouraged, and given the html object tag, could you not create the equivalent of the manifest fallback by using an xhtml file in the spine, and in that file use an object tag to load the foreign resource (in my test case an embedded pdf file) and then a second child object tag to load the fallback xhtml resource.
This approach seems to work properly in a number or e-readers that do or do not support the foreign resource, it uses html object tag fallback following the whatwg latest spec, etc requiring no epub3 specific support.
The only issue current epubcheck has with it is that the fallback object data url is ignored by epubcheck so the fallback html file can never be in the spine with linear = no since epubcheck thinks there is no link to it.
Maybe just remove manifest fallbacks completely from the spec (deprecate them) and tell users to use the pure html object tag fallback mechanism instead with the added benefit of the spine always being pure xhtml files.
Maybe just remove manifest fallbacks completely from the spec (deprecate them) and tell users to use the pure html object tag fallback mechanism instead with the added benefit of the spine always being pure xhtml files.
The option of removing fallbacks completely from the spec came up several times. However, per the charter of our group, we were forbidden to remove standard features (inherited from EPUB 3.2). More exactly, it was a requirement that any valid EPUB 3.2 publications should remain valid in terms of EPUB 3.3 (the goal, obviously, that the new version should not "disrupt" any deployed EPUB publications). So, as a measure of caution, we kept fallbacks (which also got some implementations).
The deprecation, possibly adding some text referring to HTML object tags, etc, is something that could be on the plate of future work in the maintenance Working Group that we are planning.
Thank you for possibly considering it in the future.
The nested object tag approach will currently allow foreign resource fallback to xhtml even in browsers/e-readers that do not support manifest fallbacks for spine items. Useful given our testing shows very few e-readers support opf manifest fallbacks at all for items in the spine.
The only issue current epubcheck has with it is that the fallback object data url is ignored by epubcheck so the fallback html file can never be in the spine with linear = no since epubcheck thinks there is no link to it.
This may have to be looked at, as it may require some slight modification on the spec text (I have not checked).
At present, the spec is soon going to the final round of becoming a Recommendation (W3C jargon for standard). Maybe it is worthwhile to raise an erratum once the Rec is published to look at this (probably minor) issue in the spec and in epubcheck
The current epubcheck has similar issues with the Nav being in the spine with linear = "no" though all epub3 ereaders properly process the Nav and provide their own interface for accessing it. This is an issue as ebooks try to hide the Nav when providing a pure html based TOC to the reader but the Nav itself can and does link to itself as a landmark meaning it must be in the spine (again according to epubcheck).
I am not sure what the 3.3 spec says about either of these recent changes to epubcheck. If you want I would be happy to open specific epubcheck issues for both cases with sample code if these are truly issues that you want or need to have tracked. Just let me know.
This goes back to a change made in EPUB 3.1 that requires that all non-linear content in the spine be linked to so that users can reach it (in case non-linear content is suppressed by the reading system).
I expressed reservations about this in https://github.com/w3c/epubcheck/issues/1451#issuecomment-1369795349 when epubcheck was updated, and it was raised again in https://github.com/w3c/epubcheck/issues/1488, because it requires using the landmarks nav to satisfy the linking requirement. It's workable for the cover and toc because we have semantics for them, but not for any generic non-linear documents.
I'm fine opening a new issue about this, but I don't think it's something we're going to solve before going to PR. I'd take this up in the maintenance group after we're done.
Interesting ... but an xhtml file with an object tag with a data url pointing to another xhtml file (one listed in the spine with linear = no since it is acting as a fallback) should certainly be classified as a "link" to that file for the purposes of epubcheck meeting the 3.3 spec, shouldn't it?
but an xhtml file with an object tag with a data url pointing to another xhtml file (one listed in the spine with linear = no since it is acting as a fallback) should certainly be classified as a "link" to that file for the purposes of epubcheck meeting the 3.3 spec, shouldn't it?
No, because once you put it in the spine it becomes part of the content and not strictly a fallback. This takes us out of fallbacks and into the thornier issue about whether non-linear content is meant to be rendered as part of the default reading order or not. Some reading systems will render all non-linear content where it is placed in the spine, some will not render it. The requirement to add links grew out of that impasse: if some reading systems aren't going to render the content, then there must be some way to reach it.
You don't normally want to reach a fallback, however; only if the reading system doesn't support the foreign resource you wanted to render. Otherwise, you end up with a situation where the reader can encounter that fallback twice: once because the reading system couldn't support the foreign resource and a second time because it's in the spine. That's why manifest fallbacks weren't listed in the spine but were chained together using an attribute.
To the case you're suggesting, though, I assume this is what you want to do:
<object data="some_foreign_resource.xyz" type="data/xyz">
<object data="fallback.html" type="application/xhtml+xml">
</object>
</object>
In that case, there's no need to put the fallback in the spine. It's just another embedded resource. The only time it's required to be a non-linear item in the spine is if you hyperlink to it:
<object data="some_foreign_resource.xyz" type="data/xyz">
Read <a href="fallback.html">the fallback</a>.
</object>
There are cases like cover pages, though, where there aren't easy workarounds like this. If these are in the spine, and nothing links to them, then unless you can find an out-of-band means of linking to that page, like the landmarks nav, then you're stuck with an epubcheck error.
The deprecation, possibly adding some text referring to HTML object tags, etc, is something that could be on the plate of future work in the maintenance Working Group that we are planning.
I'm not optimistic about this. The only use of manifest fallbacks I've heard of is to allow images in the spine exactly so they don't have to be wrapped inside of an html file -- going back to other long discussions about what is allowed in the spine without fallback. Deprecating them for a solution that requires wrapping the images in HTML might not go over well (unless we find out that use has died).
Agreed. But nothing in the manifest fallbacks in the current spec limits it to just images ... so it applies as well to any foreign resource used in the spine, doesn't it?
And given the general lack of e-reader manifest fallback support for spine items, why not encourage an html spec compliant fallback approach since it actually works and requires no special epub3 only structures or support.
That way both epub users and epub developers have a way to include any foreign content in an epub with real working fallback in current e-readers with no changes needed.
And fwiw, I do think the Nav document should be allowed to have linear=no as the last item of the spine without an additional link to the Nav being provided because access to the Nav is guaranteed by all e-readers, and many Nav documents local link to themselves via their own landmark section thereby requiring them to be in the spine in the first place, making it all a bit circular.
Thanks for listening and considering. And for pointing out my long held definition of linear = no is not universal by any means (or even well understood by me!)
But nothing in the manifest fallbacks in the current spec limits it to just images ... so it applies as well to any foreign resource used in the spine, doesn't it?
Right, but I don't think they've been used much (at all?) outside of images. I'm not even sure if images are used in the spine much. There was a push to enable manga/comics as EPUBs a number of years back that led to some long discussions about what is allowed in the spine. I only point that out because it might make deprecating manifest fallbacks complicated. If they have been used for images, then we'd probably get pushback to deprecating them. If they haven't, then using intrinsic html fallback methods, whatever they happen to be (object, picture, etc.), is always better.
And for pointing out my long held definition of linear = no is not universal by any means
You're not alone. It's probably the most confusing feature in all of epub... 😕
So, I wrote a test for manifest fallbacks. I made a JSON content document (
media-type="application/json"
) with an XHTML fallback. Apple Books, Thorium, and Calibre display the JSON directly. ADE 4.5 crashes on opening the file.I suspect this is because browsers/web views will try to open JSON and render it as text.Then I made an EPUB with an XML content doc (
media-type="application/dtc+xml"
). Apple Books said the book was corrupt, although there were no EPUBCheck errors. When I opened it in Thorium, I was presented with a dialog box allowing me to download the XML file. Calibre rendered the XML as text.