openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
18 stars 16 forks source link

Handle wrong extension in optimize_image() #65

Closed satyamtg closed 3 years ago

satyamtg commented 3 years ago

This fixes #64 by handling wrong extension while optimizing images. It creates a copy of the original file with correct extension if wrong extension is found.

Also, unrelated - Use correct directory in the yt_downloader context manager test to not have files downloaded in workdir,

codecov[bot] commented 3 years ago

Codecov Report

Merging #65 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #65   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           24        24           
  Lines          976       976           
=========================================
  Hits           976       976           
Impacted Files Coverage Δ
src/zimscraperlib/image/optimization.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a0c5f0f...2d23c58. Read the comment docs.

satyamtg commented 3 years ago

Okay. I understand. I initially thought of doing it this way because I thought about the fact that if we use some external tool, it might throw errors for wrong extension (which seems to happen quite frequently in case of some scrapers). But I get the point now about the filesystem usage. I have also checked gifsicle for these kinds of errors, but it seems its not the case with it. I'm rebasing and doing from_suffix=True in the individual optimizers. If in future we add another external tool, we shall check for extension errors and if they occur, we shall handle that in the individual function itself.

rgaudin commented 3 years ago

Yes, AFAIK, only gif uses an external tool so we should have a dedicated test for that as well. For GIF, because of this specific requirement, we could test both format and extension but I don't think we should handle that in the lib. The scraper should rename source if it's incorrect I think.

satyamtg commented 3 years ago

Yes, AFAIK, only gif uses an external tool so we should have a dedicated test for that as well. For GIF, because of this specific requirement, we could test both format and extension but I don't think we should handle that in the lib. The scraper should rename source if it's incorrect I think.

I also added a fallback to suffix check if Pillow was not able to identify format (allowed the last two tests to pass). I don't think a test for gifsicle is needed because it doesn't have problems with wrong extensions.