openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
19 stars 16 forks source link

Add image optimization #36

Closed satyamtg closed 4 years ago

satyamtg commented 4 years ago

This adds image optimization to the imaging module and breaks it up into several parts inside a submodule called image. This basically aims to fix #26 The following changes are made -

Some refactor and other changes would be needed here (that's why WIP) and we might also want to support WebP. What do you think @rgaudin ?

codecov[bot] commented 4 years ago

Codecov Report

Merging #36 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##            master       #36    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           22        24     +2     
  Lines          798       930   +132     
==========================================
+ Hits           798       930   +132     
Impacted Files Coverage Δ
src/zimscraperlib/image/__init__.py 100.00% <100.00%> (ø)
src/zimscraperlib/image/convertion.py 100.00% <100.00%> (ø)
src/zimscraperlib/image/optimization.py 100.00% <100.00%> (ø)
src/zimscraperlib/image/presets.py 100.00% <100.00%> (ø)
src/zimscraperlib/image/probing.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 752a42b...0e1d907. Read the comment docs.

stale[bot] commented 4 years ago

This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

satyamtg commented 4 years ago

I have added optimization support for WebP and GIF images, tests for them and some presets. The WebP optimization takes place using Pillow whereas the GIF optimization makes use of gifsicle. I first tried Pillow based pure python optimization for GIF but soon realized that it was a particularly bad idea to optimize GIFs using Pillow as even after trying the lowest settings possible, I was only able to get a 18 KB GIF optimized to a 56 KB GIF. Similar things happened with many other GIFs and I was at no point able to get a size below the original size for any GIF I downloaded from the internet which wasn't made by Pillow itself.

gifsicle is quite good with GIFs and with the use of the --lossy option in version 1.92 (I updated ci.yml to use ubuntu-20.04 instead of ubuntu-18.04 as we only have 1.91 in the default package registry), we can get even smaller GIFs than what we currently have in sotoki.

rgaudin commented 4 years ago

Please rebase off master

satyamtg commented 4 years ago

Please rebase off master

@rgaudin did it.

satyamtg commented 4 years ago

One decision we have to make would be whether we allow converting in the process. As I wrote above, I think those functions should not but we could have an additional one (maybe at image module) that combines conversion (if different) and optimization. Let me know what you think about that.

For the convertion thing, I think having a small function exposed at the image module to handle convertion before optimizing is a good idea. But shall we have a explicit argument say allow_convert=True or convert_to=PNG for instance, or should we check the input and output format and do that implicitly? I think we can go with allow_convert=True and then raise an error if convertion is required but not allowed. We would in that case check the format to convert to using the dst suffix (or can have an extra parameter, say convert_to=PNG as well). What do you think?

rgaudin commented 4 years ago

I think we could do as we do for conversion:

satyamtg commented 4 years ago

I think we could do as we do for conversion:

  • we find out src format from pillow
  • we find out dst format from fmt param if specified or from dst filename.
  • fail if we can't guess it obviously
  • if different, then convert
  • call optimization

I have added a allow_convert option to optimize_image in optimization.py and exposed it in __init__. I think it'd be better than having two functions.