satyamtg commented 4 years ago

This adds image optimization to the imaging module and breaks it up into several parts inside a submodule called image. This basically aims to fix #26 The following changes are made -

zimscraperlib.imaging is broken into convertion.py, probing.py and transsform.py and moved to zimscraperlib.image. These new files contain the following functions -
- convertion.py - convert_image and create_favicon
- probing.py - get_colors and is_hex_color (maybe we shall remove this)
- transform.py - resize_image
The save_image from imaging module has been moved to __init__.py in the image module
The ALPHA_NOT_SUPPORTED constant is moved to zimscraperlib.constants.py
optimization.py is introduced inside the newely created image module. It is as follows -
- It contains a class ImageOptimizer which contains the optimization methods for several image types and holds the image optimization options as a dict, such as png_options, jpeg_options etc. (PNG and JPEG supported as of now)
- ImageOptimizer also contains a method optimize_image to optimize image by detecting its filetype. It also supports overriding the main optimization parameters set at the time of object creation
- The current PNG and JPEG optimizations are performed using this that does it in pure python
- As far as PNG compression goes, in general we can achieve better compression than what we do with sotoki at the cost of a little color reduction (but then pngquant also reduces color). On the commons.png in this very repository, pngquant + advdef made it from 10KB to around 5.5KB, whereas this optimizer did like 3.9KB.
- As far as JPEG compression is concerned, it performs little worse than the jpegoptim thing that we used. On the square.jpg in this repository, jpegoptim made it from 130KB to 28KB, whereas this one did it to 35KB. All this was performed with JPEG quality set to 50 (the same as set in our jpegoptim call).

Some refactor and other changes would be needed here (that's why WIP) and we might also want to support WebP. What do you think @rgaudin ?

codecov[bot] commented 4 years ago

Codecov Report

Merging #36 into master will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            master       #36    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           22        24     +2     
  Lines          798       930   +132     
==========================================
+ Hits           798       930   +132

Impacted Files	Coverage Δ
src/zimscraperlib/image/__init__.py	`100.00% <100.00%> (ø)`
src/zimscraperlib/image/convertion.py	`100.00% <100.00%> (ø)`
src/zimscraperlib/image/optimization.py	`100.00% <100.00%> (ø)`
src/zimscraperlib/image/presets.py	`100.00% <100.00%> (ø)`
src/zimscraperlib/image/probing.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 752a42b...0e1d907. Read the comment docs.

stale[bot] commented 4 years ago

This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

satyamtg commented 4 years ago

I have added optimization support for WebP and GIF images, tests for them and some presets. The WebP optimization takes place using Pillow whereas the GIF optimization makes use of gifsicle. I first tried Pillow based pure python optimization for GIF but soon realized that it was a particularly bad idea to optimize GIFs using Pillow as even after trying the lowest settings possible, I was only able to get a 18 KB GIF optimized to a 56 KB GIF. Similar things happened with many other GIFs and I was at no point able to get a size below the original size for any GIF I downloaded from the internet which wasn't made by Pillow itself.

gifsicle is quite good with GIFs and with the use of the --lossy option in version 1.92 (I updated ci.yml to use ubuntu-20.04 instead of ubuntu-18.04 as we only have 1.91 in the default package registry), we can get even smaller GIFs than what we currently have in sotoki.

rgaudin commented 4 years ago

Please rebase off master

satyamtg commented 4 years ago

Please rebase off master

@rgaudin did it.

satyamtg commented 4 years ago

One decision we have to make would be whether we allow converting in the process. As I wrote above, I think those functions should not but we could have an additional one (maybe at image module) that combines conversion (if different) and optimization. Let me know what you think about that.

For the convertion thing, I think having a small function exposed at the image module to handle convertion before optimizing is a good idea. But shall we have a explicit argument say allow_convert=True or convert_to=PNG for instance, or should we check the input and output format and do that implicitly? I think we can go with allow_convert=True and then raise an error if convertion is required but not allowed. We would in that case check the format to convert to using the dst suffix (or can have an extra parameter, say convert_to=PNG as well). What do you think?

rgaudin commented 4 years ago

I think we could do as we do for conversion:

we find out src format from pillow
we find out dst format from fmt param if specified or from dst filename.
fail if we can't guess it obviously
if different, then convert
call optimization

satyamtg commented 4 years ago

I think we could do as we do for conversion:

we find out src format from pillow

we find out dst format from fmt param if specified or from dst filename.

fail if we can't guess it obviously

if different, then convert

call optimization

I have added a allow_convert option to optimize_image in optimization.py and exposed it in __init__. I think it'd be better than having two functions.

openzim / python-scraperlib

Add image optimization #36

Codecov Report