w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
305 stars 60 forks source link

What does it mean to "support" a foreign resource? #1464

Closed dauwhe closed 2 years ago

dauwhe commented 3 years ago

So, I wrote a test for manifest fallbacks. I made a JSON content document (media-type="application/json") with an XHTML fallback. Apple Books, Thorium, and Calibre display the JSON directly. ADE 4.5 crashes on opening the file.

I suspect this is because browsers/web views will try to open JSON and render it as text.Then I made an EPUB with an XML content doc (media-type="application/dtc+xml"). Apple Books said the book was corrupt, although there were no EPUBCheck errors. When I opened it in Thorium, I was presented with a dialog box allowing me to download the XML file. Calibre rendered the XML as text.

dauwhe commented 3 years ago

Made a third test with media-type="application/xml". Thorium displayed the XML tree.

Thorium-manifest-fallback-003
teytag commented 3 years ago

An experience I wanted to share. I prepared my EPUB3 content with "cryptocurrency-blockchain" (JSON). The book is read when logged in with the "reading key" prepared for the reader. ReadiumDesktop and Thorium viewed it successfully. And in Thorium the "bookmarks" feature worked fine.

dauwhe commented 3 years ago

Reading system developers: does your RS use manifest fallbacks? Do you check mime types to decide when to use the fallback?

danielweck commented 3 years ago

No.

/hides in shame

iherman commented 3 years ago

The issue was discussed in a meeting on 2021-05-27

View the transcript ### 2. What does it mean to "support" a foreign resource? (issue epub-specs#1464) _See github issue [#1464](https://github.com/w3c/epub-specs/issues/1464)._ **Dave Cramer:** When writing some spec tests, one of the foundational aspects is a core media type … ie something that does not need a fallback … so manifest fallbacks are also foundational … Wrote some fanciful tests with docbook, binary files, etc … _**No**_ reading system implemented manifest fallbacks … Some of the behavior was not ideal for the end user … for instance a .dmg file was downloaded to the local system [Scribe note: !!!] … Looks like everyone just throws it at a webview and let the webview handle it … Even Readium doesn't handle them … What are the implications of a core RS feature not working in the real world? **Shinya Takami (高見真也):** In Japan, in some cases manifest fallback is implemented, but may be domain specific … Would like to discuss with Voyager people in Japanese … [Japanese] **Masakazu Kitahara:** [via shiestyle] Voyagers RS does not support this in RSes, but in some places we do use the feature in Japan **Brady Duga:** we have two pipelines at Google, 1 for publishers, and 1 for people sideloading … the one for publishers is better for support for this type of thing **Wendy Reid:** Don't think it is supported by Kobo **Brady Duga:** dauwhe can you make your sample epubs available? **Dave Cramer:** yes, i'll let you know where **Dan Lazin:** I have some tests that are not checked in, because I don't know what the proper behavior is supposed to be … Do we need a graveyard for these sorts of things? Since I can't set the "does it pass" field **Dave Cramer:** The tests are great, as it points to where we should be investigating … Seems like a case I would like more semi official information on what RSes support/claim to support with regard to manifest fallbacks … If no one implements it, we need to have hard conversations … If there aren't implementations we need to remove it … But need to keep concept of core media type **Wendy Reid:** Since we now have several tests without clear passes … Does it make sense to make a list and send the tests out to the community pre-CR … Typically this is done during CR phase, but would be good to know where we are in trouble **Dan Lazin:** To pursue, in my test sheet, I have some blank cells so we can add notes there … Can use that as a way to corral a list **Brady Duga:** this sounds like a problem, but if nobody has implemented manifest fallbacks, then maybe we just say that you can only use core media types in the spine, period **Dave Cramer:** Part of this is how epub has changed over time … used to have the idea of general epub container, but that is not what really happened … Have several next steps to gather info and look into hard to test tests … Have enough useful take aways **Ben Schroeter:** Where did we leave the first conversation? **Dave Cramer:** We have the goal of communicating the status, but where to do that is still an open question … Will check with Ivan and Matt to figure out the best path forward … Break now? **Wendy Reid:** Yes! **Dave Cramer:** Brady can eat dinner **Wendy Reid:** Will reconvene at the upcoming whole hour
iherman commented 3 years ago

The issue was discussed in a meeting on 2021-05-28

View the transcript ### 5. What does it mean to "support" a foreign resource? _See github issue [#1464](https://github.com/w3c/epub-specs/issues/1464)._ _Continuation of the discussion [on the first vF2F meeeting](https://www.w3.org/publishing/groups/epub-wg/Meetings/Minutes/2021-05-27-epub#section2)_ **Dave Cramer:** came about from testing. … json with HTML fallback, all normal EPUB stuff … when looking at it in an EPUB Reading system, but they just displayed the raw text. … weird behavior. Spec says if a RS does not know about or support something it should use the fallback, but I didn't see this. … JSON as raw text, or as a tree view, etc.. … This shocked me that the core media types if not supported should default to the fallback but that doesn't seem to be happening. **Ivan Herman:** So what? 😀 … I can imagine that I write a paper which relies on a bunch of data files, and xml or csv data etc. It may not be nice but we just ack that and leave it. **Dave Cramer:** write something in doc book but may not render in all systems so you put in the HTML as a fall back. but that didn't seem to happen. **Ivan Herman:** is it a problem if a RS displays the data directly? having a fallback would be a nice idea but maybe have to accept it. Is it ok to put a docbook in the spine, if I link to it or put it in the spine are two separate issues. **Dave Cramer:** RS offering to download a some mimetype seems really bad. **Matt Garrish:** Why do we care what renders in spine items? Offering guidance. … seems its not working as intended, but we don't want to reproduce PDF. … EPUB was trying to do better. we can look at these issues separately. Why are we concerned about Core media types. what do we say about the spine. we are an accessible text format. **Wendy Reid:** DMG file is a massive security issue, that an issue that RS would handle on their side. If a RS detects an executable within an EPUB it should do something. But we don't want to spec that executables be included in the spine that may open up a can of worms. **Ivan Herman:** It must be part of the security section in the RS document if you are a RS be careful about downloading a binary file. EPUB check could react to this. But the manifest fallback is ignored anyways… so it may not make sense there. EPUBCheck may not complain. **Brady Duga:** Yes core media types are important and maybe the RS are already supporting them and not worry about fallbacks. … here is this fundamental thing in the spec and core principle and Dan tested it and it seems it isn't. **Matt Garrish:** I have no issue getting rid of fallbacks. Spine XHTML / SVG, but we could make it a formal allowed spine item. But there could be neat stuff and our fallbacks don't work. We are just calling it a failed feature of EPUB. We don't really want to have them all in the spine just the two of them. **Wendy Reid:** agreed. **Brady Duga:** like ChemML and you may want to use those or have a fallback with a static page, but everyone just uses HTML & CSS so I think we weren't originally sure but we didn't want to limit people. **Matt Garrish:** Manifest fallbacks with replacements for HTML, there is some really bad manifest fallbacks and HTML has improved to solve that, we have the picture element etc. So sounds like its all irrelevant. **Ivan Herman:** spine restricted to XHTML & SVG, and epubCheck would shout at me so thats ok. Not having a fallback there epubCheck will shout at me. So for files which refer to from html, maybe something musicML and have this ML and they may have their own special renderer, but they should be able to use that without, … so what would be the fallback for MusicML so I link to HTML and for valid reasons and I render it via some extra scripts; I cannot rely on fallbacks since I cant rely on something else doing this for me. … HTML gives me this, but EPUB version have a fallback why would I do that? **Brady Duga:** I was going to do some more research on playbook side. **Wendy Reid:** ingested versions vs. side loaded. > **Proposed resolution: Close issue 1464, change language in spec around manifest fallbacks to restrict to xhtml and svg in the spine and rely on HTML conventions** *(Wendy Reid)* > **Proposed resolution: Close issue 1464, change language in spec around manifest fallbacks to restrict to xhtml and svg in the spine and rely on HTML/SVG conventions** *(Wendy Reid)* **Matt Garrish:** wonders, restricting to XHTML/SVG are we deprecating? Can't remove for manifest fallbacks. > **Proposed resolution: Close issue 1464, change language in spec to deprecate manifest fallbacks and recommend xhtml and svg in the spine and rely on HTML/SVG conventions** *(Wendy Reid)* > *Avneesh Singh:* +1 Brady, it is a big change **Brady Duga:** feels like a really big change, I am hesitant to do it. To understand the nuances. **Dan Lazin:** we talked about things to talk about at risk, add this? > *Avneesh Singh:* +1 Tzviya, there is a long tail **Tzviya Siegman:** trying to think about 1000's of backlist titles, we don't update those. If I need to do an update it may be a more significant update, and what a retailer but this may be a breaking change. **Wendy Reid:** this doesn't break, just deprecate it so moving fwd. Legacy content using this is not really working anyways **Ivan Herman:** If we say deprecated what does EPUBCheck do today? **Matt Garrish:** Issue a warning **Ivan Herman:** that could scare off a publisher right? **Matt Garrish:** Yes **Ivan Herman:** what can we use? **Matt Garrish:** "Strongly Encourage" **Ivan Herman:** I understand what Brady says, in the meantime, Matt & I can come up with a PR what that means in the spec. … not merge the PR and Brady can do his research, and not trigger adverse reaction from EPUBCheck. **Wendy Reid:** warning is not a bad thing… if Publisher is scare of that, what are publishers doing I will look at the warnings or errors it may be inconvent but they are looking at them. If they are seeing warning, its not a big deal, at least for me. **Tzviya Siegman:** warnings and errors for some reading systems won't accept EPUBs even f they have warnings. unfortunately they are at the same level. … some organizations regard warnings as errors … I can't publish with warnings. > *Avneesh Singh:* +1 Matt and Ivan, start working on the language **Charles LaPierre:** we had this discussion before, we really need to go to those RS and shame them or inform them to allow warnings, because otherwise we shouldn't use warnings since its the same thing as an error. **Matt Garrish:** we shouldn't resolve anything now, I will work with Ivan to come up with new language on this. **Brady Duga:** Dave has some sample books. **Wendy Reid:** we can do a little more testing. ---
mattgarrish commented 3 years ago

To pick up from the F2F, the specification change here is pretty minimal - we deprecate manifest fallbacks. Done.

In terms of impact, however, I see three areas of concern:

  1. Anyone who has tried to put images in spine is going to get warnings now, which is likely to impact some producers of comics and manga. From past experience with deprecation, this may not be an issue in Asia as it is in North America/Europe, but I believe there has been some work in Europe on representing comics in epub. We'd probably need to know more about whether this is done using manifest fallbacks or if the publishers are following existing epub guidelines to embed in xthml/svg.
  2. Anyone who has used (is using) manifest fallbacks to include a non-core media type image file in an img tag will get warnings. There's a strong case here that this is plain old bad practice given that picture now exists for this purpose. But it means legacy content would have to be updated. Do we know if anyone is using manifest fallbacks for this purpose, or is this a piece of epub 2 legacy that can happily be done away with?
  3. Probably the most complicated piece is what this means for being able to open a non-content document format, since anything that opens by hyperlink has to be in the spine. If you want to open an image out of its html by clicking on it, for example, the only way to make that valid is by putting the image into the spine and having a manifest fallback. That wouldn't be possible anymore unless you can tolerate the warning. My only thought here is maybe we should look at the target attribute and not require core media type fallbacks so long as _blank is specified. (But whether it's viable to do at this point I can't say.)

That's as far as I've gotten trying to wrap my head around the impacts of a change. Doable, but it will entail a measure of pain for some publishers.

mattgarrish commented 3 years ago

And I suppose one other (at least theoretical) use for manifest fallbacks was to provide options among content documents. I specifically remember this being introduced as a kind of "no script" option, so if a reading system that didn't support scripting found a content document marked as scripted it could look in the fallback chain for a non-scripted alternative. Similarly, it could look for a document without mathml markup, etc. if it didn't support the technology.

I haven't heard of manifest properties being used in this way (pretty cumbersome and duplicative), or of reading systems supporting them for fallback lookup, so I'd be surprised if deprecating would have any effect here.

iherman commented 3 years ago

To pick up from the F2F, the specification change here is pretty minimal - we deprecate manifest fallbacks. Done.

:-)

[...] That's as far as I've gotten trying to wrap my head around the impacts of a change. Doable, but it will entail a measure of pain for some publishers.

At the moment, we have ways of addressing unsupported features: deprecation and legacy. The former means the generation of warning, the latter does not. Labelling fallbacks as 'legacy' is not really a way to go because those refer to EPUB 2.* features.

What about introducing a third category, say, "discouraged"? Discouraged features are very close to deprecated ones, except for the last sentence in A.1:

Validation tools SHOULD alert EPUB Creators to the presence of deprecated features when encountered in EPUB Publications.

Instead, it could say something like

Validation tools MAY alert EPUB Creators to the presence of deprecated features when encountered in EPUB Publications.

We can then discuss this with the epubcheck people: by default no warning are issued for discouraged features, but there may be a separate flag that does result in warnings.

Would that be a way to go?

dauwhe commented 3 years ago

3. anything that opens by hyperlink has to be in the spine. If you want to open an image out of its html by clicking on it, for example, the only way to make that valid is by putting the image into the spine and having a manifest fallback.

This is what I'm finding problematic. The example in #1911 has a link to a JPEG image. Currently we require the content author to create an SVG version of that image that will never be used, even if the reading system supported manifest fallbacks. Requiring the fallback here serves no purpose.

At the very least, I don't think this makes much sense for core media types that are replaced elements.

mattgarrish commented 3 years ago

Allowing any core media type in the spine without fallbacks opens the door further to image-only EPUBs.

Quasi-restrictions like having to be replaced content are easy to manipulate. You could list all your img tags in a file you never use, or put them all in a document but rendered hidden.

That's already a failing of fallbacks, of course, as you can make every non-content document fall back to the same HTML page that says "sorry, you're out of luck" to satisfy the requirement. (Or just use fallback HTML/SVG wrappers without alt tags or descriptions.)

Plus it's not like fallbacks satisfy WCAG, either. If you use them and do make accessible alternatives, you can't claim your content is accessible because there's no way for users to choose whether they want the image or its alternative. And that assumes there is even widespread support for fallbacks, which there isn't, so that also makes their use an immediate failure.

Fallbacks are kind of useless however you look at them, so I have no issue with them disappearing. The question seems to be how far we want to go in the direction of allowing anything in the spine. We can keep pushing the door open a little bit at a time, or we can go all-in and allow any CMTs.

For accessibility, we ultimately have to rely on education, legislation and other means to push publishers to produce accessibly, anyway. Ebooks are mature enough now that novels published as a series of image-only pages isn't terribly realistic anymore. Publishers seem generally aware of the problems of image-based content for accessibility, reading on mobile, etc.

rdeltour commented 3 years ago

Allowing any core media type in the spine without fallbacks opens the door further to image-only EPUBs.

What if we loosen the requirement that hyperlinked resources must be in the spine?

For instance, we could say:

That would keep the requirement that Content Documents are in the spine if they're reachable from top-level documents, and yet allow things like <a href="my-image.jpg"> or <a href="audio.mp3"/> while not allowing images in spine.

Would that be reasonable?

mattgarrish commented 3 years ago

What if we loosen the requirement that hyperlinked resources must be in the spine?

But that leads back to the problem of reading systems not being able to locate where the user is in the spine - what comes next, what do you go back to, how do you handle bookmark or annotation attempts, etc.

iherman commented 3 years ago
  • hyperlinked local resources must be CMT

In a scholarly book it should be reasonable to have a hyperlink to a data file, javascript code, or a chemical formula described in an XML file (with these contents being part of the EPUB). What should we do about those? I think that the fallback mechanism is o.k. for them, although I am not sure it helps anything to have these resources also appear in the spine (albeit with linear=no)

bduga commented 3 years ago

FWIW, Play Books does look at the fallback chain and will ignore images directly in the spine unless they are are SVG. We can of course loosen that, but then we have to start deciding how to style all these random images in the spine. Why are we considering this change in this version of the spec? Are we concerned that there just don't exist two implementations that actually support fallbacks?

dauwhe commented 2 years ago

How serious are we about deferring to the HTML spec to describe how reading systems handle various types of content? The spec covers:

So it makes sense that XML files, text files, or PDFs don't trigger fallbacks because a rendering engine knows what to do with them. But that's not true for DMGs, .exe files, etc.

iherman commented 2 years ago

@dauwhe I think the first two items and the third are different.

I think it would be possible to spec this better, but I also think that expanding on EPUB and RS in this direction would make a lot of sense.

(Thanks for pointing out that the HTML spec has sections for these; this makes our job much easier!)

dauwhe commented 2 years ago

Interesting: https://github.com/w3c/epubcheck/issues/1298

iherman commented 2 years ago

The issue was discussed in a meeting on 2022-03-11

View the transcript ### 1. What does it mean to "support" a foreign resource? (issue epub-specs#1464) _See github issue [epub-specs#1464](https://github.com/w3c/epub-specs/issues/1464)._ **Dave Cramer:** given that CR is approaching we want to work through our remaining issues. … this is one we've discussed many times before, re. foreign resources and manifest fallbacks. … as you remember I made a bunch of test books, putting weird MIME types in spine with HTML fallback to see what would happen in real RS. … much to my surprise I have not yet found one where fallback is displayed to end user. … question is how do we interpret these tests? What are RS expected to do in these circumstances?. … one of the most straightforward examples is if I have JSON in spine with HTML fallback, pretty much every RS will display JSON in plain text the way a browser would. … in some sense that's in keeping with HTML spec since that spec has instructions for how browsers should display all sorts of different MIME types. > *Ivan Herman:* See [Dave's reference to HTML spec on xml and text](https://github.com/w3c/epub-specs/issues/1464#issuecomment-1018083509). **Dave Cramer:** similar thing might happen if you try to put XML in spine, where some RS will give you the browser style tree view. **Dave Cramer:** we have something of a conflict between spec and reality, so what do we do? (if anything). … the idea of fallback is pretty woven through the spec. … at least one RS will read the fallback chain, though unclear what it does with that data. … so, is the test with JSON in spine (as described above) failing?. **Brady Duga:** fallbacks are at the resource level, but you still have to use the resource correctly. … a RS may well support JSON if read into a script, the test kind of misuses JSON. … what happens if you reference an image from an audio resource?. … you're still going to get weird rendering even if that image is cmt. … we still require author to use resource in sensible way. **Dave Cramer:** say you are doing book about javascript and you want to hyperlink to JSON file. … if you hyperlink to something it has to be in the spine, so it still has to work. **Brady Duga:** little weird to hyperlink to JSON in spine when author could style JSON text in their style to make it look correct visually. … so i'm not sure what it means to support it. … but i think it's valid to say that you've supported JSON if you just display to user. **Dave Cramer:** we had once envisioned special purpose RS that could display docbook format, for example, but where same epub could still be displayed reasonably in non-speciality RS. > *Rick Johnson:* For reference as a reading system, while we support scripting, we do not support manifest fallbacks. We only support xhtml in the spine. For example, if you want to show just an image at the content doc level in Bookshelf, you must wrap it in xhtml for us to be able to present it.. **Ivan Herman:** re. question about whether fallback should be used - strictly speaking, your test follows what the spec allows. … but if nobody implements fallbacks, then it becomes an underimplemented feature of epub, and a problem. … that said, I was intrigued by [your comment back in January](https://github.com/w3c/epub-specs/issues/1464#issuecomment-1018083509), because we rely on the HTML spec, and the HTML spec has statement of what to do with text files. … Reading systems rely of webview which per html display text files properly (per that spec). … we should liberalize the spec and not force removing a usable feature. **Matt Garrish:** Not sure if opening up the spine is a good idea now. … We can just restrict to epubs for now, to avoid controversy. … Don't really like fallbacks, we have kind of made a mess of the html model. … would love to get rid of it, but what happens to old content?. **Ivan Herman:** We can make deprecated, or say it is under-implemented. … But we can find a backwards compatible way of specifying it. **Dave Cramer:** Can we avoid throwing away the concept, but make it clear RSes will never show the fallbacks?. **Rick Johnson:** Is this a case for some standard language to say there are not two implementations and it is dangerous to use?. **Ivan Herman:** Yes, under-implemented is our current term. **Zheng Xu (徐征):** From implementor side, how to support foreign resources is passed of to web engine?. … But fallback is used in some case depending on region. … So some resource will work in RS, but not others and it can't be wrapped in xhtml. How should we do specify this?. … A normal browser just downloads the resource, but a RS still needs to be able to turn pages. **Matt Garrish:** If we put a big scary box in the manifest section it will help, but how do we tell people?. **Brady Duga:** we walks the fallback chain for something that should be displayed like html or svg. … but we haven't actually tested it. … we did add this intentionally. … we might use it in a few other places. … we have a list of what resources we expect at a particular point. … and find one on the list. … responding to Matt. … if no one supports this, their epubs are going to be stupid :). **Dave Cramer:** It has been common to wrap things in `` tags to make them bigger, but that isn't legal in epub. **Ivan Herman:** What does html spec say about images like that?. … that is, what to do when the link is an image?. … Whatever the [html spec says is displayable](https://html.spec.whatwg.org/multipage/browsing-the-web.html#read-media), we should follow or at least allow. **Tzviya Siegman:** There is also linking with iframes that has been done in epub. It is clumsy, though, agree with Ivan, we should allow what html allows. **Dave Cramer:** [Reading processing model for non-html content]. **GeorgeK:** This was presented as an a11y issue, but don't see any real a11y benefits here. … So I am ok with getting rid of it. **Ivan Herman:** The big issue here is a .dmg or .exe file being linked. … Maybe we need to make this much more explicit in the security section. … This whole procedure seem obsolete. **Brady Duga:** I don't think that security statement makes sense. … all that security stuff depends on manifest mime types, which would be false. … you shouldn't trust the package here. … we download all the resources. … my android reader downloads everything in the manifest. … but then if you tell me I shouldn't download certain parts of the epub. … I don't think that restriction makes a lot of sense. … we shouldn't execute stray code from an epub. … I'm hoping that's clear already. … I don't see how we turn this into a security issue. **Dave Cramer:** There is a distinction between downloading as a RS, vs as an end user clicking on a navigation link that allows me to download. **Matt Garrish:** Are we saying we should allow anything in the spine?. **Dave Cramer:** I don't think we are talking about opening the spine. **Ivan Herman:** I agree. **Dave Cramer:** But we still have the existing epubs that have fallbacks. **Ivan Herman:** We may have to separate two situations. Let's put the spine aside for now. … the other use case is an `` element linked to a json file, what should the RS do?. … I think we should just refer to the html spec in that case. … The only reason we keep the fallback is for core media in the spine. … Since there is existing content. … We may end up with it being under-implemented when we exit CR. **Matt Garrish:** These aren't different cases - if it is linked to it must be in the spine. **Ivan Herman:** What if it is outside the epub?. **Matt Garrish:** In that case it opens in a new browser context. **Dave Cramer:** Say you click on that image that isn't in the spine, then we have all the nav issues (how do you go forward, bookmark, etc). **Matt Garrish:** We have looked at pop out content, but have never gotten that far with it. **Ivan Herman:** Why does it have to be in the spine?. **Matt Garrish:** To avoid the nav issues. … Once it is loaded in the viewport, there has to be a position in the spine so the RS can handle it. **Brady Duga:** this is related to `linear=no`. **Ivan Herman:** it's in the spine but `linear` is `no`. **Brady Duga:** it has to be in the spine to link to it. **Tzviya Siegman:** We have had this conversation after about every revision of the spec. … and we end up saying we can't rewrite browser contexts, and this conversation has to be about RSes and not content. … So it is more about RS paging and the concept of a page, and not content. **Charles LaPierre:** One thing a browser has is a back button. … Also solves multiple references to a single place. … All problems are solved by the back button. > *Ivan Herman:* +1 to CharlesL. **Zheng Xu (徐征):** We have a back button, but then there is still the issue with bookmarks. **Dave Cramer:** History and html (ie back buttons) is really complex. **Zheng Xu (徐征):** for this issue is the question how we can write a test?. **Matt Garrish:** Maybe there is some discussion to resurrect about the target of an ``. … So you could do this without having it in the spine. **Dave Cramer:** I don't really know what epub without fallbacks looks like. … might need something that says a fallback is not likely to be presented. > *Tzviya Siegman:* +1 to dauwhe. **Ivan Herman:** we must solve this before cr.... … We need to say what happens to text. **Dave Cramer:** RSes understand JSON. **Brady Duga:** from a CR perspective a test for this is JSON in the spine, saying that if you see this means the RS supports JSON. **Dave Cramer:** And that is basically the test I made. **Brady Duga:** we are trying to define support, when the reading system should define it. **Dave Cramer:** Agree. **Ivan Herman:** We don't want to say RSes should support types html says they should. **Brady Duga:** it's up to the RS to decide if they're gonna use a fallback. … if you put JSON in you're supposed to put in a fallback. The RS chooses.. **Dave Cramer:** And no RSes would display the fallback. **Ivan Herman:** Isn't it correct that all RSes that are newer and rely on browser cores would display json correctly. … We are saying for epub content to be correct you must put a fallback even though RSes don't need that. **Brady Duga:** fallbacks are not just because RSs cant display something. … we did this because we need a11y info. … we could display an image in the spine, no problem. But there's no a11y and no styling.. … it may look great somewhere and looks terrible somewhere else. … let's have authors style things, rather than chasing reading systems. **Dave Cramer:** Want time to discuss CR.
iherman commented 2 years ago

I try to see some specific ways forward from the current situation, also based on our discussion on the call.

Fallback optional?

In line with reality (following the "paving the cow-path" approach of some specs) would it make sense to reduce the obligation of fallback chains from MUST to SHOULD? The first glance of the necessary spec changes this would mean:

  1. In §2.2.1.3 Foreign Resources of the core spec, it currently says "EPUB Creators MUST provide fallbacks"; this could be changed to SHOULD
  2. In §2.1 Foreign Resources of the RS spec i says "...MUST process fallbacks for unsupported Foreign Resources"; this could be changed to SHOULD

We could be more stringent and leave (2) as a MUST, but I am not sure whether that will fly in terms of implementations.

Align with the HTML spec

As it has been said, the HTML spec has actually a definite way to display text files and/or XML files. Our current approach of requiring fallbacks for those means, therefore, a further restriction on HTML. To alleviate this we could:

Note that, interestingly, json file do not fall under any of these, nor does HTML seem to say anything about them, because the media type for json is application/json...


We may want to spawn to separate issues here, and closing the current one. I think that @bduga has given an answer to the original question of the issue regarding the tests...

mattgarrish commented 2 years ago

Another question to ponder: what does it mean to fall back to a supported media type?

Consider you embed some shiny new image format into an HTML document in an img tag and also want it to open as a content document so you add a hyperlink to it.

You can satisfy both cases using manifest fallbacks, but reading systems apparently are supposed to follow the order created by the author in the absence of a properties attribute with more information (and there are no properties for this situation).

So say you satisfy the spine requirement first and in the manifest make the image fall back to an XHTML content document. Then to satisfy the img element you have to make the XHTML content document fall back to a JPEG, like this:

Foreign image -- falls back to --> xhtml -- falls back to --> jpeg

Does this mean a reading system is supposed to use the xhtml in the img tag because that's the author's preferred order of fallbacks?

Are reading systems supposed to be aware of what formats make sense in what elements?

Are authors supposed to be aware that they have to craft the fallback chain to compensate for dumb reading system replacements and fall back to the image before the xhtml?

This obviously isn't a big problem in reality, but just another case of how manifest fallbacks are a flaky idea for HTML.

My understanding is that manifest fallbacks were introduced to work around the lack of fallbacks for img, but now that it has a srcset attribute and there is also the picture element, can we consider, if not outright deprecating the use, trying to put a stronger caution in against manifest fallbacks as a replacement mechanism within content documents?

Maybe even label that use a "legacy" feature only meant to support compatibility with EPUB 2?

iherman commented 2 years ago

Maybe even label that use a "legacy" feature only meant to support compatibility with EPUB 2?

I would definitely not have sleepless nights if we did that...

iherman commented 2 years ago

@mattgarrish do you want to propose a PR for https://github.com/w3c/epub-specs/issues/1464#issuecomment-1080714266 or should we leave it as for now (the CR might force us to label fallbacks as under-implemented anyway...)?

Regardless, I think this issue may be now closed.

Cc @dauwhe

mattgarrish commented 2 years ago

I wonder if it might help to split the uses and talk about them separately. Right now, you get a mix of requirements that apply when the fallback chain is for the spine and requirements that apply for elements lacking intrinsic fallbacks. It might give some room to caution about deeper problems like I mentioned in that comment, too.

I don't think it would hurt to caution against the HTML use, too, but we unfortunately can't actually call it legacy when I think about it more. That would mean epub 3 reading systems would have to stop supporting it and it would mean any content that has relied on it would become invalid (if it's a legacy feature, then it wouldn't count as providing a CMT anymore as far as EPUB 3 is concerned).

mattgarrish commented 2 years ago

But closing this issue as it's not directly related.

kevinhendricks commented 1 year ago

One related question ... if manifest fallbacks are discouraged, and given the html object tag, could you not create the equivalent of the manifest fallback by using an xhtml file in the spine, and in that file use an object tag to load the foreign resource (in my test case an embedded pdf file) and then a second child object tag to load the fallback xhtml resource.

This approach seems to work properly in a number or e-readers that do or do not support the foreign resource, it uses html object tag fallback following the whatwg latest spec, etc requiring no epub3 specific support.

The only issue current epubcheck has with it is that the fallback object data url is ignored by epubcheck so the fallback html file can never be in the spine with linear = no since epubcheck thinks there is no link to it.

Maybe just remove manifest fallbacks completely from the spec (deprecate them) and tell users to use the pure html object tag fallback mechanism instead with the added benefit of the spine always being pure xhtml files.

iherman commented 1 year ago

Maybe just remove manifest fallbacks completely from the spec (deprecate them) and tell users to use the pure html object tag fallback mechanism instead with the added benefit of the spine always being pure xhtml files.

The option of removing fallbacks completely from the spec came up several times. However, per the charter of our group, we were forbidden to remove standard features (inherited from EPUB 3.2). More exactly, it was a requirement that any valid EPUB 3.2 publications should remain valid in terms of EPUB 3.3 (the goal, obviously, that the new version should not "disrupt" any deployed EPUB publications). So, as a measure of caution, we kept fallbacks (which also got some implementations).

The deprecation, possibly adding some text referring to HTML object tags, etc, is something that could be on the plate of future work in the maintenance Working Group that we are planning.

kevinhendricks commented 1 year ago

Thank you for possibly considering it in the future.

The nested object tag approach will currently allow foreign resource fallback to xhtml even in browsers/e-readers that do not support manifest fallbacks for spine items. Useful given our testing shows very few e-readers support opf manifest fallbacks at all for items in the spine.

iherman commented 1 year ago

The only issue current epubcheck has with it is that the fallback object data url is ignored by epubcheck so the fallback html file can never be in the spine with linear = no since epubcheck thinks there is no link to it.

This may have to be looked at, as it may require some slight modification on the spec text (I have not checked).

At present, the spec is soon going to the final round of becoming a Recommendation (W3C jargon for standard). Maybe it is worthwhile to raise an erratum once the Rec is published to look at this (probably minor) issue in the spec and in epubcheck

kevinhendricks commented 1 year ago

The current epubcheck has similar issues with the Nav being in the spine with linear = "no" though all epub3 ereaders properly process the Nav and provide their own interface for accessing it. This is an issue as ebooks try to hide the Nav when providing a pure html based TOC to the reader but the Nav itself can and does link to itself as a landmark meaning it must be in the spine (again according to epubcheck).

I am not sure what the 3.3 spec says about either of these recent changes to epubcheck. If you want I would be happy to open specific epubcheck issues for both cases with sample code if these are truly issues that you want or need to have tracked. Just let me know.

mattgarrish commented 1 year ago

This goes back to a change made in EPUB 3.1 that requires that all non-linear content in the spine be linked to so that users can reach it (in case non-linear content is suppressed by the reading system).

I expressed reservations about this in https://github.com/w3c/epubcheck/issues/1451#issuecomment-1369795349 when epubcheck was updated, and it was raised again in https://github.com/w3c/epubcheck/issues/1488, because it requires using the landmarks nav to satisfy the linking requirement. It's workable for the cover and toc because we have semantics for them, but not for any generic non-linear documents.

I'm fine opening a new issue about this, but I don't think it's something we're going to solve before going to PR. I'd take this up in the maintenance group after we're done.

kevinhendricks commented 1 year ago

Interesting ... but an xhtml file with an object tag with a data url pointing to another xhtml file (one listed in the spine with linear = no since it is acting as a fallback) should certainly be classified as a "link" to that file for the purposes of epubcheck meeting the 3.3 spec, shouldn't it?

mattgarrish commented 1 year ago

but an xhtml file with an object tag with a data url pointing to another xhtml file (one listed in the spine with linear = no since it is acting as a fallback) should certainly be classified as a "link" to that file for the purposes of epubcheck meeting the 3.3 spec, shouldn't it?

No, because once you put it in the spine it becomes part of the content and not strictly a fallback. This takes us out of fallbacks and into the thornier issue about whether non-linear content is meant to be rendered as part of the default reading order or not. Some reading systems will render all non-linear content where it is placed in the spine, some will not render it. The requirement to add links grew out of that impasse: if some reading systems aren't going to render the content, then there must be some way to reach it.

You don't normally want to reach a fallback, however; only if the reading system doesn't support the foreign resource you wanted to render. Otherwise, you end up with a situation where the reader can encounter that fallback twice: once because the reading system couldn't support the foreign resource and a second time because it's in the spine. That's why manifest fallbacks weren't listed in the spine but were chained together using an attribute.

To the case you're suggesting, though, I assume this is what you want to do:

<object data="some_foreign_resource.xyz" type="data/xyz">
   <object data="fallback.html" type="application/xhtml+xml">
   </object>
</object>

In that case, there's no need to put the fallback in the spine. It's just another embedded resource. The only time it's required to be a non-linear item in the spine is if you hyperlink to it:

<object data="some_foreign_resource.xyz" type="data/xyz">
   Read <a href="fallback.html">the fallback</a>.
</object>

There are cases like cover pages, though, where there aren't easy workarounds like this. If these are in the spine, and nothing links to them, then unless you can find an out-of-band means of linking to that page, like the landmarks nav, then you're stuck with an epubcheck error.

mattgarrish commented 1 year ago

The deprecation, possibly adding some text referring to HTML object tags, etc, is something that could be on the plate of future work in the maintenance Working Group that we are planning.

I'm not optimistic about this. The only use of manifest fallbacks I've heard of is to allow images in the spine exactly so they don't have to be wrapped inside of an html file -- going back to other long discussions about what is allowed in the spine without fallback. Deprecating them for a solution that requires wrapping the images in HTML might not go over well (unless we find out that use has died).

kevinhendricks commented 1 year ago

Agreed. But nothing in the manifest fallbacks in the current spec limits it to just images ... so it applies as well to any foreign resource used in the spine, doesn't it?

And given the general lack of e-reader manifest fallback support for spine items, why not encourage an html spec compliant fallback approach since it actually works and requires no special epub3 only structures or support.

That way both epub users and epub developers have a way to include any foreign content in an epub with real working fallback in current e-readers with no changes needed.

And fwiw, I do think the Nav document should be allowed to have linear=no as the last item of the spine without an additional link to the Nav being provided because access to the Nav is guaranteed by all e-readers, and many Nav documents local link to themselves via their own landmark section thereby requiring them to be in the spine in the first place, making it all a bit circular.

Thanks for listening and considering. And for pointing out my long held definition of linear = no is not universal by any means (or even well understood by me!)

mattgarrish commented 1 year ago

But nothing in the manifest fallbacks in the current spec limits it to just images ... so it applies as well to any foreign resource used in the spine, doesn't it?

Right, but I don't think they've been used much (at all?) outside of images. I'm not even sure if images are used in the spine much. There was a push to enable manga/comics as EPUBs a number of years back that led to some long discussions about what is allowed in the spine. I only point that out because it might make deprecating manifest fallbacks complicated. If they have been used for images, then we'd probably get pushback to deprecating them. If they haven't, then using intrinsic html fallback methods, whatever they happen to be (object, picture, etc.), is always better.

And for pointing out my long held definition of linear = no is not universal by any means

You're not alone. It's probably the most confusing feature in all of epub... 😕