rschroll / beru

The Basic Epub Reader for Ubuntu
http://rschroll.github.io/beru/
GNU General Public License v3.0
24 stars 12 forks source link

Handle one-html-file books a bit better #39

Open stuartlangridge opened 10 years ago

stuartlangridge commented 10 years ago

Beru doesn't deal all that well with epubs which have the whole book content in one massive HTML file rather than a number of small files. Now, obviously, a well-put-together epub doesn't do that, but I clearly have a fair amount of non-well-put-together epubs. The book reader will essentially hang for seconds at a time, especially when switching back to Beru from another app, or after rotating the phone. (Note: Beru itself is not hung: the toolbar shows up fine.)

rschroll commented 10 years ago

I suspect the problem isn't with Beru itself, but with the Monocle library we're using to lay out the Epub. Beru is just responsible for unzipping and serving the HTML file, and this is done in C++, so it's unlikely to be slow. Monocle is responsible for laying out the view, and this happens in Javascript. The fact that this slowness happens on rotation, which should only involve Monocle code, supports this suspicion.

Unfortunately, this means that the fix lies within Monocle, and I don't know that code very well. I'll leave this bug open, but don't expect a quick solution. It may be worth opening a bug with Monocle about this. If you do so, please link it here. Or, if you can send me a problematic epub, I'll open a Monocle bug.

stuartlangridge commented 10 years ago

I agree almost completely with you, but I think that Monocle won't get any faster because it's just dealing with a really big file. My thought was that Beru could unzip the file and then, if it's large (Calibre splits files up into 260KB chunks by default if they're larger than that), Beru breaks it up into bits as if it were actually produced that way. But... that might be hard if you have to also rewrite-on-the-fly the spine stuff; the impression I had from Monocle was that you can tell it "here is a list of files" any place where it wants one file, but I might very well be wrong because I hardly know that code at all :)

rschroll commented 10 years ago

I thought of that briefly, since it does get around the Monocle-needs-to-parse-the-big-file issue. Rewriting the spine wouldn't be too much of a problem. The two issues I fear are:

Frankly, I wonder if a better solutions is a dedicated epub reformatter that takes care of this. Potentially, this could ship with Beru and be run the first time a epub is opened.

stuartlangridge commented 10 years ago

Hm. I hadn't thought about links. Darn. That is a problem.

A dedicated reformatter would be ideal, wouldn't it? The only one I know of is in Calibre, though, and that (while state-of-the-art) is in Python and therefore no use.

I'll have a poke around and see if I can find anything, although I'm sure you'll do the same!

On Sat, Mar 8, 2014 at 7:05 PM, Robert Schroll notifications@github.comwrote:

I thought of that briefly, since it does get around the Monocle-needs-to-parse-the-big-file issue. Rewriting the spine wouldn't be too much of a problem. The two issues I fear are:

  • Figuring out where to split the file. Monocle start a new page for each new HTML file, so you don't want to do this in the middle of a line. But you could do this just before a header and probably be okay.
  • Re-targeting internal links. We'd have to find all anchors, figure out what new file they'll end up in, and adjust all links to point to that new file. Not impossible, but it leaves a lot of places to screw up.

Frankly, I wonder if a better solutions is a dedicated epub reformatter that takes care of this. Potentially, this could ship with Beru and be run the first time a epub is opened.

— Reply to this email directly or view it on GitHubhttps://github.com/rschroll/beru/issues/39#issuecomment-37106438 .

New Year's Day -- everything is in blossom! I feel about average. -- Kobayashi Issa

rschroll commented 10 years ago

Actually, it may not be as bad as I feared. We don't actually need to rewrite all links in all HTML files. Instead, we could adjust where those links lead later, either client side or server side. We could detect clicks on links with Javascript and adjust their targets before the request is sent out, though this would require passing a remapping from the server to the client. Or we could wait for the requests to come in and serve the correct part of the file, though this would require us to get the anchors, which aren't passed to the server, as far as I know.