koreader / koreader

An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
http://koreader.rocks/
GNU Affero General Public License v3.0
16.64k stars 1.26k forks source link

Epub3 support with readium sdk #2602

Closed Markismus closed 7 years ago

Markismus commented 7 years ago

I've been hacking at the LoebDigitalLibary website rip that has been floating around the internet since March last year and it is rather a waste to dumb the result down to Epubv2. It looks beautiful in Calibre reader, but I lose most of the formatting in the coolreader engine (both koreader's implementation and KoboLauncher's coolreader): I get a page that looks nothing like the much read paper version of the Loeb Classical Library. (I even went so far as to convert lovely looking html-code to pdf, but this comes with its own array of troubles using wkhtml2pdf.)

So looking at epub3, I only found the readium-sdk as a c++ source for rendering epub3 pages.

Has anyone looked into epub3 yet?

Has anyone taken a look at readium-sdk as a complement or replacement for the coolreader engine?

Would someone be able to sketch the work involved in implementing another epub engine?

Frenzie commented 7 years ago

Btw, MuPDF also does EPUB.

I even went so far as to convert lovely looking html-code to pdf, but this comes with its own array of troubles using wkhtml2pdf

Prince is much better.

Markismus commented 7 years ago

I'll try Prince. UPDATE: Too much errors with about 1300 blank pages instead of 600 filled with text.. After ripping out all offending codes, I got the text in working condition.

The status of epub3 support for muPDF remains unclear. They refer back to koreader, though!

For now, this is the calibre viewer result for a Sigil generated epub:calibre viewer

This is the mupdf result: mupdf The errors are rampant! But they seem to suggest that epub3.0 is not supported. Even crengine just displays the margin linenumbers and notes inline.

Markismus commented 7 years ago

There seems to be one other non-comercial SDK: https://github.com/AnFengDe/AnFengDe_EPUB. Missed it before because it is already 3 years old.

MuPDF does open epub v2 very nicely and fast. (Somewhat surpisingly, Gnome automatically opened the next epub that I downloaded from mobileread in muPDF.)

retrue commented 7 years ago

There is FBReader too. It was a competitor of Coolreader3 years ago, when the first e-ink readers appeared. The authors had two consecutive projects here in Github (https://github.com/geometer/FBReader and https://github.com/geometer/fbreader-native) but they left and they are now actively working on ports to iOS and Android with premium versions https://fbreader.org/

Frenzie commented 7 years ago

I'll try Prince. UPDATE: Too much errors with about 1300 blank pages instead of 600 filled with text.

I suppose I didn't mean it was better at just printing some random input, but more that it's better in that you can actually produce high-quality content whereas wkhtml2pdf cannot go beyond mediocre no matter what you try. More recent versions also support some basic JS to do some fancier things. Your Calibre screenshot is certainly well within the range of possibilities. But, um, I just took a look at a random EPUB in Calibre and… (note the bottom left)

screenshot_2017-03-02_20-16-32-fs8

A5 or A6 ought to do well enough on an ereader? screenshot_2017-03-02_20-17-33-fs8

Edit: yeah… I'd go with A5 (and smaller margins). And sane units of measurement if they're available.

A5

screenshot_2017-03-02_20-24-54-fs8

A6

screenshot_2017-03-02_20-25-06-fs8

Markismus commented 7 years ago

Yes, I tried that before. Somehow the complexity of this --the same epub as above-- stripped site outsmarts Calibre: Calibre generated pdf

Frenzie commented 7 years ago

You haven't yet told us what "this" is exactly, btw. :-P I mean, I'm sure it'd be findable but why waste time.

From my results I figured Calibre was just printing a PDF using the same mechanism as its on-screen display.

Markismus commented 7 years ago

So it results in an image in an container, not in text in pdf-format?

Frenzie commented 7 years ago

No? Just try a random EPUB from http://dbnl.org/tekst/cour006vlae01_01/ or Project Gutenberg or whatever. Also I'm going to have to track down your mystery Plato-EPUB if you don't link to it soon, lol

Markismus commented 7 years ago

Sorry. No manners nowadays anymore. Here it is.

Can't be tracked down. It is extracted from the site-rip from Loeb Digital Library. Build a perl script of way too many lines, tweaked the css to get the paper layout and generated the single pages as single-html-files. Threw that in Sigil to make sure that I had a proper epub and started checking what worked ion what device: Firefox eats everything as html, Calibre changes all width settings as you resize the window but works, Kobo Nickel, coolreader and koreader can't keep the margins: Linenumbers and notes end up inline.

Biggest problem is that there are >300 volumes, so I'll have to automate it: Can't go throwing everything into Sigil by hand even if it would generate perfect epubs.

Frenzie commented 7 years ago

Darn, too late today and my free time's basically spent. Eh, maybe I'll be able to squeeze it in somewhere :P

On Fri, Mar 3, 2017 at 9:52 PM, Markismus notifications@github.com wrote:

Sorry. No manners nowadays anymore. Here it is https://www.dropbox.com/s/ss2g90d9yu5n6o5/LDL234.epub?dl=0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/koreader/koreader/issues/2602#issuecomment-284066982, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMYBd__q-Va4YWFHTyqOevXezwW3uZ1ks5riH18gaJpZM4MQ0Dk .

Markismus commented 7 years ago

The rip of LDL site can be found here. It seems rather big, but when you recompress it with 7z it ends up being 500MB.

houqp commented 7 years ago

Writing a new rendering engine from scratch will probably require couple months fulltime work from a rock start engineer. All the epub rendering engines so far are not designed to take full advantage of multi-core processors as far as i know, which means there is a huge opportunity in terms of performance optimization in this field. Projects like servo from mozilla has already demonstrated that a concurrent rendering engine can beat the existing one by 10X or even 100X easily. Higher performance not only means a faster reader, but also helps with power consumption a lot. So if someone wants to get it a go, I recommend rust ;)

I did look into readium sdk a little bit awhile ago, but didn't get too far due to lack of documentation. It could be a short term solution to bring in epub3 support.

Markismus commented 7 years ago

@houqp Nice, an on-topic swing! Always inspiring to introduce another language! I liked Readium because it Clibary could be easily linked into lua. However, as you said, I have trouble finding good documentation about its use. I don't think the setup of koreader is to write the rendering machines. I actually meant a sketch of the implementation of a rendering machine such as Readium into the koreader frontend.

I remember the people at muPDF writing about koreader's heavily modified coolreader engine. I also remember people busy with hacking popup footnotes into it. (Did that succeed, BTW?) How much work would it be to introduce tables with defined width. (This caught my interest. It also lists the Kobo app as being a good epub viewer: Does that mean that the newer firmware for Kobo Nickel would also support epub3?)

@retrue I used fbreader on my tablet. It's really nice. However, from the github account I can't see whether it is supporting epub3. I remember people asking for support for native fbreader ebooks a few years ago, but nobody went into that direction then. Now, I see an issue on their github repo asking whether the project is dead. So I don't think I'll be going into that direction now, either.

@Frenzie My main opposition to Prince was its watermark and the absence of a Perl wrapper script. However, the reason for the last one is that it is almost natively supported. Moreover, the watermark can be removed with one line of Perl code. So I guess I will try Prince some more. Even if it's only because it is very verbose about what it doesn't like about the css-file being fed to it. (I stripped about 70% of the irrevelant lines, changes all the values that mattered: Next thing is probably about removing box-sizings and such..) EDIT: Removed all offending code and now Prince is working :) Still need to rip the watermark and fine tune everything.

retrue commented 7 years ago

@Markismus Kobo's Nickel supports epub3, at least officially. That is what the specifications of their e-reader say. For example, for Kobo Aura One https://us.kobobooks.com/products/kobo-aura-one?utm_source=Kobo&utm_medium=TopNav&utm_campaign=Aura%20ONE Different matter is how good is that support. About FBReader. I used it in some of the firmwares availbles for my e-reader Hanlin v3. And FBReader is active on their web fbreader.org They switched their interest to iOS and Android and they are working on epub3. I don't know how much of it is opensource.

Frenzie commented 7 years ago

@Markismus I don't really see the big deal with the little Prince logo on the first page (plus there's the license), but if I wanted to do that I'd just add an empty first page instead of hacking elements in the PDF. :-P

Markismus commented 7 years ago

@Frenzie Yes, well...I generated a new html-page for each page: Every page had that logo. So it never occurred to me that it would only be on the first. Still, I just reopened the pdf-file, looped over all lines and as soon as an object type was annotation, I emptied the lines until the end of the object.

I think I am going to forget about epub3; I'll just reflow the pdf for now. :) When everything works for pdf, I'll look into epub again. Maybe I can dumb down the html without killing the layout.

@retrue I'll retry kobo after Prince eats all my future html. Maybe, I should give the epub3 plugin for Sigil a try. Yep, FBReader seems to have turned commercial with iOS. I doubt that they'll publish a freely usable epub3 engine now.