joseph / peregrin

A library for inspecting Zhooks, Ochooks and EPUBs, and converting between them.
http://ochook.org/peregrin
MIT License
37 stars 17 forks source link

EPUB: XHTML documents use .html extension instead of .xhtml #10

Open elmimmo opened 13 years ago

elmimmo commented 13 years ago

Peregrin names html inside an EPUB using extension .html, which makes debugging/tweaking the EPUBs difficult since, as they are, browsers will not render as an EPUB reader would.

.xhtml documents, at least on the Mac, render differently in Chrome 12 and Safari 5.0.5 than .html do (for example .html documents ignore CSS namespaces — stumbled upon this when trying to CSS style elements with attribute epub:type, which requires namespace declaration in both the XHTML and CSS files)

inventive commented 13 years ago

This sounds like a web server configuration problem. Extensions aren't significant to any browser (although I don't know the behaviour of browsers inferring mimetypes via the file:// protocol, but that's not a viable protocol for ebook component rendering anyway).

The documents in an EPUB should be served with the mimetype specified in the OPF file. If you don't do that, rendering will be erratic in lots of ways.

joseph commented 13 years ago

@inventive is just me using the wrong account, btw.

elmimmo commented 13 years ago

I did not mean to say that EPUBs themselves have any issue. I meant that using the .html extension for contained XHTML files is inconvenient when trying to open those locally for debugging/tweaking (i.e. browsers seem to imply mimetype from the file extension, since no server is involved), whereas using .xhtml (I guess) would not have that or any other drawback.

joseph commented 13 years ago

I implemented this, but then saw a significant problem with it.

Peregrin internally turns each format (zhook, EPUB, ochook, etc) into an archetypal Book object, with discrete components. So when a Zhook is transformed into a Book, it is componentized — index.html is split up into a number of HTML files. These files have a filename associated with them — this is important for creating <a> links between the various components.

However, these files are still in HTML5 — they are not transformed to EPUB's ghetto XHTML until the EPUB write phase. So, these files should not have a ".xhtml" extension when they are not XHTML, but they should not be renamed during the EPUB write phase because internal links will break.

For that reason I think it's a wontfix, but I'll leave this issue open for a while to see if anyone has a brainwave. Here's my failed attempt: https://gist.github.com/1050754