openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Convert png/jpg images to webp #139

Closed kelson42 closed 3 years ago

kelson42 commented 3 years ago

I believe that would be a good idea and would achieve to save probably up to 20% of disk usage.

I don't see any problem for the HTML part (with the polyfill).

I'm more worried about the EPUB part:

rgaudin commented 3 years ago

Webp is not in epub2 and epub3 spec (only gif, jpeg, png and svg/xml). Also, epub readers (devices) are usually never updated and there is no strong reason for them to be (even if manufacturers were releasing updates – which they don't). There is no such thing as an obsolete reader as long as it supports epub2.

So my opinion is to not change the formats inside epub.

eshellman commented 3 years ago

I agree with @rgaudin

soloturn commented 3 years ago

a relevant discussion is here: https://github.com/w3c/publ-epub-revision/issues/1344

On Tue, Oct 13, 2020 at 8:26 PM Eric Hellman notifications@github.com wrote:

I agree with @rgaudin https://github.com/rgaudin

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openzim/gutenberg/issues/139#issuecomment-707927108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJRYEIYTWMXH4TMUKABVDSKSLUTANCNFSM4SOTKC5Q .

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

alvarotrigo commented 1 year ago

I dont get it, what's the problem with using a fallback to png/jpeg and provide webp for those web browsers supporting it?

<picture>
    <source srcset="image.webp" type="image/webp">
    <source srcset="image.jpg" type="image/jpeg">
    <img src="image.jpg">
</picture>
rgaudin commented 1 year ago

@alvarotrigo then we would not save space but increase it (multiple formats for same image) and that wouldn't work in epub as there is no JS so no polyfill

alvarotrigo commented 1 year ago

@rgaudin not sure if I'm missing something here.

Sure, you won't be saving storage space, but you would be saving data transfer because webp format tends to be lighter than png and jpeg formats.

Doesn't epub3 support the <picture> element out of the box?

rgaudin commented 1 year ago

I don't know about epub3 but I doubt gutenberg's epub are. We probably don't want to handle this conversion ourselves at this point.

The priority between ZIM size and data transfer is definitely the former given the main use case is offline/LAN.

eshellman commented 1 year ago

this reminds me to ask - are you pulling epub3 now?

rgaudin commented 1 year ago

@eshellman do you produce both epub2 and epub3 files???

We're using the first file that hasapplication/epub+zip mimetype so if it's epub3 then yes.

eshellman commented 1 year ago

the epub3 files are named like pg15470-images-3.epub

benoit74 commented 1 year ago

Yes, the scraper downloads the epub3. Not sure this is really intentional, looks like it is more due to the sorting of the rsync results since when we have multiple files matching for a given format the scraper prefers the ones with images and then get the first one if there is still multiple options. .epub3.images is before .epub.images in the list.

benoit74 commented 1 year ago

And I'm not sure this is the best choice, since it means that all old readers that support only epub2 are not capable to use the Zim since there is mostly one epub3. At least as far as I've understood the difference between epub2 and epub3

rgaudin commented 1 year ago

No, epub3 are readable with epub2 readers, up to what epub2 supported. So no new fancy feature but the basics should be OK. There are comparisons online.

I've looked at a couple file and all images are using the traditional format.

this ticket was closed 2 years ago. If one thinks we should download epub2 only, please create a separate ticket.