gwtproject / gwt

GWT Open Source Project
http://www.gwtproject.org
1.52k stars 373 forks source link

Feature request: Make GWT ePub friendly #9864

Open evoludolab opened 10 months ago

evoludolab commented 10 months ago

The client side of the GWT toolkit offers fascinating opportunities for creating truly interactive content in ePubs.

Basically the only stumbling block for ePubs are instructions to the effect of innerHTML=" ". XHTML doesn't understand   but is the basis for the ePub specs. However,

  1. changing to <!DOCTYPE html> is not enough,
  2. adding <!DOCTYPE html [<!ENTITY nbsp "&#160;">]> has no effect and
  3. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> does not pass ePub validation.

Replacing all &nbsp; with &#160; would do the trick. I am successfully using a patched version of the gwt-user.jar to run individual based simulations and numerical solvers of differential equations side-by-side to the corresponding theory in a book project.

Overall the change seems rather harmless and so I was curious to hear what others think.

niloc132 commented 10 months ago

Interesting thoughts @evoludolab. Can you talk a little more about the xhtml/JS runtime used for ePub implementations, what other limitations there might be? I'm concerned that this is only a superficial change that will prompt us to discover that there are many more lurking issues. That is: if a valid XHTML doctype is not valid in ePub, but the HTML5 doctype is treated as if it were an XHTML doctype (which I don't believe it is), then there are probably other surprises waiting for us.

https://en.wikipedia.org/wiki/Document_type_declaration#Common_DTDs specifies several XHTML doctypes, and separately specifies the HTML5 doctype. https://www.w3.org/TR/xhtml1/dtds.html shows that XHTML 1.0's default dtd includes nbsp. https://www.w3.org/TR/2000/WD-xhtml-modularization-20000105/dtd_module_defs.html shows that XHTML 1.1's default dtd includes nbsp as well. It might be fair to say that a tool that accepts XHTML, but doesn't accept valid XHTML, doesn't actually accept XHTML...


As another avenue for you to consider, the widgets in gwt-user.jar are relatively static at this point, taking care to cater to legacy applications and old browsers - instead, you might want to consider building your application in Java/GWT with elemental2 for direct DOM access, or some toolkits that build on top of that (off the top of my head, elemento, domino-ui, dncomponents). This should give you greater control over what is happening in the DOM, and leave out some of the various workarounds that were necessary in the old days of web development (e.g. Widget's main reason for existing is memory leaks when using event listeners, etc).


A quick look at GWT's source shows me only 5 files that would need a change, which currently include nbsp. I only see javadoc comments using #160, and no instances of #xA0. I don't see an obvious reason in history why this change would not be acceptable.

evoludolab commented 10 months ago

Thanks for your encouraging feedback @niloc132 . I completely agree with your sentiments regarding ePubs and XHTML.

However, before I continue, I should add that I am largely out of my depth here. To provide some context, I am an academic doing research in mathematics and evolutionary theory but no formal training in anything software - my patchy expertise is all the result of decades of patient tinkering.

The ePub specs state that "An XHTML content document MUST be an [html] document that conforms to the XML syntax.” (see https://www.w3.org/TR/epub-33/#sec-xhtml). Since HTML5 has become a living standard it seems that, by extension, the ePub specs cannot be set in stone any longer and quirks in different ePub readers can be expected. It appears to me that epucheck has become the de facto standard for ePubs.

Can you talk a little more about the xhtml/JS runtime used for ePub implementations, what other limitations there might be? <snip>

JavaScript support is optional in the ePub3 specs and consequently not supported across all readers. For now I am mostly targeting Apple Books. Its support for JavaScript likely ranks among the best because it’s backed by WebKit. Apple introduced their own quirks and restrictions that are unfortunately not transparent (mostly related to embedding of and restrictions for JavaScript content but inconsequential for GWT). Also, Apple Books requires that ePubs pass the latest epubeck.

Apart from issues with reader implementations I don’t see anything that would affect GWT. Once I replaced the nbsp's (as well as the capitalized tag names that GWT 2.8 still used to have) everything went flawlessly ever since.

As another avenue <snip> you might want to consider building your application in Java/GWT with elemental2 for direct DOM access,

Thanks for this great suggestion and I will look into it. Admittedly, at this point I am somewhat reluctant to change my current setup because the project has been in gestation for a number of years…

A quick look at GWT's source shows me only 5 files that would need a change, which currently include nbsp. I only see javadoc comments using #160, and no instances of #xA0. I don't see an obvious reason in history why this change would not be acceptable.

This confirms the impression that I got and basically triggered this feature request 😄

evoludolab commented 10 months ago

Out of curiosity I started from scratch and downloaded the GWT 2.10.0 source (tagged 2.10.0-google). Apart from files in the samples and test folders or in javadoc comments, nbsp only occurs in

  1. layout/client/LayoutImpl.java (1),
  2. logging/client/HtmlLogFormatter.java (3),
  3. user/client/ui/Grid.java (3),
  4. user/client/ui/TabBar.java (2),

all relative to user/src/com/google/gwt/ where (n) indicates the number of occurrences.

Compiling GWT after replacing those 9 occurrences with #xA0 makes Apple Books happy and the code runs smoothly. 🥳