lovasoa / dezoomify-rs

Zoomable image downloader for Google Arts & Culture, Zoomify, IIIF, and others
https://dezoomify-rs.ophir.dev
GNU General Public License v3.0
683 stars 63 forks source link

Please reconsider handling of tile download errors #88

Open jsbien opened 3 years ago

jsbien commented 3 years ago

The log of downloading a dictionary volume (created with script) has 21M. Now I know I should grep it for ERROR. Because of some network problems I had 3 cases of Only ??? tiles out of ??? could be downloaded. The resulting image was still created. I would notice the problem earlier if such images had the prefix e.g. incomplete. I would be also convenient to have the URL of the whole affected image (now only the URL of the tile is printed).

lovasoa commented 3 years ago

If you are doing batch download, you should probably handle the exit status of dezoomify-rs after is has run. And you should also probably tweak the network-related settings; in particular, you should increase the number of retries when a download fails and the time between consecutive retries.

jsbien commented 3 years ago

This is my command: time curl "https://polona.pl/iiif/item/MTI2MzI0NjU/manifest.json" | jq -r ".items[].id" | xargs -n 1 ./dezoomify-rs -l --parallelism 1 --timeout 60s --retry-delay 60s Is there an easy way to add the exit status checking? Anyway I can live with it. As for the retries number I hope the network problems will not occur again. Moreover I'm not in a hurry and I don't want to increase the server load.

lovasoa commented 3 years ago

Increasing the number of retries will decrease the server load, not increase it. With only one retry, when the server starts to be overloaded and responds with errors, you will quickly move to the next tile and make one more request to the already overloaded server. With let's say 10 retries (and a parallelism of 1) dezoomify-rs will try 10 times with an exponental backoff strategy: it will make the second try after 10s, the next one after waiting another 20s, then 40s, and so on. This will be slower, but you will be sure not to overwhelm the server.

jsbien commented 3 years ago

Thanks for the explanation. What about including it in the help? Now it says -retry-delay Amount of time to wait before retrying a request that failed [default: 2s] So the default is different?

lovasoa commented 3 years ago

Yes, this should be included in the help. Are you interested in making a contribution? The argument documentation is in src/arguments.rs and the remaining documentation is in README.md.

jsbien commented 3 years ago

Please have a look at my fork and check whether I understand correctly what is going on.

lovasoa commented 3 years ago

You can open a pull request here: https://github.com/lovasoa/dezoomify-rs/compare

I'll comment on it.

drzraf commented 6 months ago

What's the exit status in case of partially saved images? Grepping for error is problematic. Something like --with-errors or --without-errors is needed for users who prefer file integrity over partial results.