openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Missing images in EPUBs that are present in HTML books #222

Open Jaifroid opened 3 months ago

Jaifroid commented 3 months ago

A user on Reddit has reported that images are missing from EPUBs (other than the book cover). Such images are present in the HTML versions. See https://www.reddit.com/r/Kiwix/comments/1bngwel/kiwix_and_book_extraction/ (an example given there is "Through the Looking Glass" in the English-language Gutenberg ZIM). I corroborate this also for a few books I've looked at.

I don't know if we can do anything about this. Do we make the EPUBs ourselves, or is this an upstream issue?

In any case, it seems like one reason why, at least for now, #95 is not a good idea, as it would entail a loss of information (images) from the ZIM.

benoit74 commented 3 months ago

This is an issue on our side, we most probably chose the wrong ePub, looks like there are many: Through the Looking Glass

95 is not related to that, such an issue could happen the other way around: missing images in HTML and images present in ePub.

benoit74 commented 3 months ago

I'm quite sure this won't be solved until real work on #97 is done (planned from somewhere in 2024)