openzim / nautilus

Turns a collection of documents into a browsable ZIM file
GNU General Public License v3.0
19 stars 14 forks source link

Recipe fails with Dropox links #26

Closed kelson42 closed 2 years ago

kelson42 commented 2 years ago

See https://farm.openzim.org/recipes/zimgit-post-disaster, even if Dropbox links for zip/png(s) seem OK. Look at this run for example: https://farm.openzim.org/pipeline/5f1c69a22e6ec6c769bfe316/debug

[nautiluszim::2021-09-13 07:21:54,480] INFO:starting nautilus scraper for https://www.dropbox.com/s/v0vwgcv7yjk8x7i/Zimgit.zip?dl=0
[nautiluszim::2021-09-13 07:21:54,481] INFO:preparing build folder at /output/build
[nautiluszim::2021-09-13 07:21:54,617] INFO:checking your branding files and values
[nautiluszim::2021-09-13 07:21:55,312] ERROR:FAILED. An error occured: cannot identify image file <_io.BufferedReader name='/output/build/favicon.png'>
[nautiluszim::2021-09-13 07:21:55,312] ERROR:cannot identify image file <_io.BufferedReader name='/output/build/favicon.png'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/nautiluszim-1.0.6-py3.8.egg/nautiluszim/entrypoint.py", line 157, in main
    scraper.run()
  File "/usr/local/lib/python3.8/site-packages/nautiluszim-1.0.6-py3.8.egg/nautiluszim/scraper.py", line 164, in run
    self.check_branding_values()
  File "/usr/local/lib/python3.8/site-packages/nautiluszim-1.0.6-py3.8.egg/nautiluszim/scraper.py", line 252, in check_branding_values
    resize_image(dest, width=width, height=height, method="thumbnail")
  File "/usr/local/lib/python3.8/site-packages/zimscraperlib/imaging.py", line 48, in resize_image
    with PIL.Image.open(fp) as image:
  File "/usr/local/lib/python3.8/site-packages/PIL/Image.py", line 2967, in open
    raise UnidentifiedImageError(
PIL.UnidentifiedImageError: cannot identify image file <_io.BufferedReader name='/output/build/favicon.png'>
rgaudin commented 2 years ago

Links are not OK in this recipe. The recipe is using ?dl=0 links which specifically requests the Dropbox page instead of the actual file. Dropbox links should have ?dl=1 instead.

Note: Dropbox reacts to wget and curl User-agent to send the actual file on both.