openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
19 stars 16 forks source link

Use WebP in scrapers #45

Closed rgaudin closed 4 years ago

rgaudin commented 4 years ago

WebP is objectively best than all other web formats we use for images and animations (JPEG, PNG, GIF).

Support

WebP support in browsers is good. It has native support in Chrome, Firefox, Edge and Opera.

The only notable exceptions are Safari for both macOS and iOS ; and kaiOS.

Note: Safari (both platforms) will receive WebP support in iOS14 and macOS bigsur (September 2020).

Fallback

Use of a fallback is thus limited to:

For those browsers, the easiest way to go is to use webp-hero, a polyfill that bundles the libwebp JS binding (uses either JS or wasm version based on wasm-support).

It works fine even with a lot of images. There can be a tiny flickering effect when switching from no-image to displayed canvas but that's not really a problem. Where we have control, we probably can set dimensions and/or backgrounds to remove it.

The main drawback is that it only supports <img /> in HTML and not CSS images.

Usage

I recommend we switch to using exclusively WebP on ZIMs made with scraper with controls over the UI. That's most scrapers but excludes generic ones like warc2zim which may use CSS image.

For our scrapers, strategy will be to either keep CSS displayed images in other formats (usually it's assets) or change the templates to use image tags instead.

Note: videojs-ogvjs uses a CSS background to display video posters. That can probably be patched though.

Tools

zimscraperlib already supports converting images to WebP (thanks to Pillow)

src = pathlib.Path("source.jpg")
convert_image(src, src.with_ext(".webp"), quality=100)
convert_image(src, src.with_ext(".webp"), lossless=True)

Note that lossless=True is different (usually larger) than quality=100.

cweb, a bianry provided with the libwebp has many options.

kelson42 commented 4 years ago

@rgaudin Great perspective!

kelson42 commented 4 years ago

@rgaudin Should we really keep this ticket open? In this repo?