lovasoa / dezoomify-rs

Zoomable image downloader for Google Arts & Culture, Zoomify, IIIF, and others
https://dezoomify-rs.ophir.dev
GNU General Public License v3.0
742 stars 66 forks source link

Batching : download multiple zoomable images at once automatically #17

Open SB2020-eye opened 4 years ago

SB2020-eye commented 4 years ago

Hi. Great work on these dezoomers! I was hoping to use the batch capabilities suggested by the dezoomify node application (https://github.com/lovasoa/dezoomify/tree/master/node-app). I'm new at this, and I just couldn't get it to work after many hours. Is there a way to run dezoomify-rs in some kind of batch way?

(And I'm trying to pull images from https://digitalcollections.tcd.ie/home/index.php?DRIS_ID=MS58_003v.)

lovasoa commented 4 years ago

dezoomify-rs is a commandline application. You can use it within a for loop in a batch script in Windows or a bash script in Linux, MacOS (or windows with wsl).

For instance, in bash, you could create a file called urls.txt containing all the urls you want to dezoomify, and then use xargs together with dezoomify-rs :

xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
lovasoa commented 4 years ago

However, I'm not sure what you mean by

the batch capabilities suggested by the dezoomify node application

The node application downloads a single image at a time (just like dezoomify-rs).

SB2020-eye commented 4 years ago

Thanks for the suggestions and links. I'll see if I can figure out a for loop for Windows.

I was trying to make this work: https://github.com/lovasoa/dezoomify/wiki/Node-app-for-shell-and-batch-usage. image

SB2020-eye commented 4 years ago

(Sorry. I see I cited the wrong "node" page in my original comment. I apologize for the confusion.)

lovasoa commented 4 years ago

Yes, you can do the same thing with dezoomify-rs (just replacing node dezoomify-node.js with ./dezoomify-rs)

lovasoa commented 4 years ago

Here is a full bash script for downloading all the 339 pages from the book of kells you mention above:

for i in $(seq -f '%03g' 1 339); do
    for side in r v; do
        ./dezoomify-rs --max-width 10000 "https://digitalcollections.tcd.ie/content/14/pages/MS58_$i$side/image.dzi" "page$i$side.jpg";
    done;
done

If you are under windows, you can use WSL and the linux version of dezoomify-rs to run it.

SB2020-eye commented 4 years ago

Wow. I had to erase my next question I was writing. Thank you!

lovasoa commented 4 years ago

I am currently running it (it's at page 28). I'll post the result here when it's ready.

lovasoa commented 4 years ago

The book of Kells: full high-quality download

extracted from the Trinity College Dublin website

These links will expire 30 days after they are last downloaded.

The Book of Kells contains the four Gospels in Latin based on the Vulgate text which St Jerome completed in 384AD, intermixed with readings from the earlier Old Latin translation. The Gospel texts are prefaced by other texts, including "canon tables", or concordances of Gospel passages common to two or more of the evangelists; summaries of the gospel narratives (Breves causae); and prefaces characterizing the evangelists (Argumenta). The book is written on vellum (prepared calfskin) in a bold and expert version of the script known as "insular majuscule". It contains 340 folios, now measuring approximately 330 x 255 mm; they were severely trimmed, and their edges gilded, in the course of rebinding in the 19th century.

SB2020-eye commented 4 years ago

Wow again! Thank you so much. I also got it to work - so you've helped me increased my capabilities. :) Much appreciated.

Are pdfs like the ones you linked basically exactly the same in resolution as the jpgs I downloaded with the script you wrote? I don't know how to tell the resolution within a pdf myself, so I'm just checking and curious to know (rather than suppose).

lovasoa commented 4 years ago

Yes, the PDF embeds the JPEG images without alteration. You can check that simply by extracting an image from the PDF.

CavalloScuro commented 3 years ago

Dear all, thank you for these comments and replies. I've attempted to run the bash command outlined by the developers in order to activate a batch dezoomify process, but I haven't been able to overcome some initial errors I'm getting. Here's what I have:

  1. A .txt file containing URLS to about 800 zoomify tile .xml files, separated by new line characters (\n)
  2. The dezoomify-rs application, unpacked in a directory on my desktop
  3. In terminal, I change the directory to the directory on my desktop, which contains dezoomify and my .txt file of URLS
  4. Then I type xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
  5. I get the following error:
xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
xargs: illegal option -- d
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]]
             [-J replstr] [-L number] [-n number [-x]] [-P maxprocs]
             [-s size] [utility [argument ...]]

I am running macOS Big Sur, which defaults to zsh. I tried typing "bash" before the xargs command, but that doesn't work either. What am I doing wrong?

Thank you in advance for your assistance.

lovasoa commented 3 years ago

https://superuser.com/questions/467176/replacement-for-xargs-d-in-osx

hannanaslan commented 3 years ago

Dear all, thank you for these comments and replies. I've attempted to run the bash command outlined by the developers in order to activate a batch dezoomify process, but I haven't been able to overcome some initial errors I'm getting. Here's what I have:

  1. A .txt file containing URLS to about 800 zoomify tile .xml files, separated by new line characters (\n)
  2. The dezoomify-rs application, unpacked in a directory on my desktop
  3. In terminal, I change the directory to the directory on my desktop, which contains dezoomify and my .txt file of URLS
  4. Then I type xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
  5. I get the following error:
xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
xargs: illegal option -- d
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]]
             [-J replstr] [-L number] [-n number [-x]] [-P maxprocs]
             [-s size] [utility [argument ...]]

I am running macOS Big Sur, which defaults to zsh. I tried typing "bash" before the xargs command, but that doesn't work either. What am I doing wrong?

Thank you in advance for your assistance.

  • CS

I am having the same problem. can you please help? [https://www.memoiredeshommes.sga.defense.gouv.fr/fr//_depot_mdh/_depot_images/LEVANT/SHDGR__GR_4_H_5/SHDGR__GR_4_H_5__001/SHDGR__GR_4_H_5__001_0001/ImageProperties.xml] I want to download from 1 to 170.

lovasoa commented 3 years ago

See my link above. You can either just remove -d '\n' from the command (if none of your urls contains whitespaces) or install gnu xargs with brew

CavalloScuro commented 3 years ago

Yes, indeed. Thank you lovasoa. I used the following command and it worked brilliantly.

xargs -n 1 ./dezoomify-rs -l < ./urls.txt

The -l is the command for downloading the largest possible image available in the .xml zoomify file. I removed the -d '\n' variables which activated the batch commands.

Thank you.

p.s. I will point out, though, that there are many tile errors in the batch command system. Lots of the images that are coming down via this batch command xargs system are missing 5 or more tiles that Terminal says couldn't be downloaded. However, when done manually one by one these tiles download perfectly.

lovasoa commented 3 years ago

Maybe the server is having some troubles serving all the requests you are making in short succession. You can use the --parallelism, --retries, --retry-delay and --timeout options to control that, and be more gentle with the server. Type dezoomify-rs --help to view the list of available options.

Also, in general, I would avoid using -l, and use something like --max-width 65535. 65,535 is the largest possible width of a jpeg image.

CavalloScuro commented 3 years ago

Thank you. I will look into this right now.

piccachilly commented 2 years ago

When I do this, I get the following error

ERROR Invalid header value: failed to parse header value multiple times.

However, my URLs are formatted correctly in .txt, each with a new line.

https://storage.googleapis.com/X/img/dzi/img_1.dzi https://storage.googleapis.com/X/img/dzi/img_2.dzi

etc.

For some reason, it will correctly process the last URL in the .txt file.

Any assistance?

Many thanks!

E-11-V commented 1 year ago

@CavalloScuro Could I ask you an example of your urls.txt file? I don't know what I'm doing wrong but I clearly am messing it up at the stage of separating each url. The error I receive is "ERROR Invalid header value: failed to parse header value"

An example of the URLs I'm using. I have tried separating them with \n but I either don't understand how to use it or I don't know, something else. I appreciate the help.

https://imagenes.patrimonionacional.es/iiif/2/X-I-4%2F0002.jpg/info.json https://imagenes.patrimonionacional.es/iiif/2/X-I-4%2F0003.jpg/info.json https://imagenes.patrimonionacional.es/iiif/2/X-I-4%2F0001.jpg/info.json

st-carr commented 1 year ago

I'm getting the same error that piccachilly mentioned back in 03/2022. It seems like this is a HTTP issue. Wondering what I'm doing wrong to trigger this error for all URLs except for the last one.

Arekkusu1998 commented 1 year ago

Mrs. @rigel71, Idk if You remember me. We met on GitHub firstly in this issue: we shared an interest in dezoomify ASMI's cadastral maps. Kindly, I would like to ask if I can contact You about those same maps: perhaps You also managed to collect the maps I'm missing, before the old High Quality Digital Library was closed. Is Your mailbox the same as it was at that time, please?

moroccan69 commented 7 months ago

Hey @lovasoa! I hope you're doing well. I've got a bit of a task on my hands and could really use your expertise. I'm trying to download the entire oeuvre by artists from Art & Culture using dezoomify in batch mode. For example, I want to download all 347 images by VVG. (https://artsandculture.google.com/entity/vincent-van-gogh/m07_m2?hl=en) I tried using your tips about using xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt, but I couldn't manage to make it work. Could you lend a hand or provide some guidance on how to go about this with a code? Your assistance would be greatly appreciated!

lovasoa commented 7 months ago

Hello! I'd be happy to setup a small consultancy contract for that! If you are interested, let me know on contact@ophir.dev