Open SB2020-eye opened 4 years ago
dezoomify-rs is a commandline application. You can use it within a for loop in a batch script in Windows or a bash script in Linux, MacOS (or windows with wsl).
For instance, in bash, you could create a file called urls.txt
containing all the urls you want to dezoomify, and then use xargs together with dezoomify-rs :
xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
However, I'm not sure what you mean by
the batch capabilities suggested by the dezoomify node application
The node application downloads a single image at a time (just like dezoomify-rs).
Thanks for the suggestions and links. I'll see if I can figure out a for loop for Windows.
I was trying to make this work: https://github.com/lovasoa/dezoomify/wiki/Node-app-for-shell-and-batch-usage.
(Sorry. I see I cited the wrong "node" page in my original comment. I apologize for the confusion.)
Yes, you can do the same thing with dezoomify-rs (just replacing node dezoomify-node.js
with ./dezoomify-rs
)
Here is a full bash script for downloading all the 339 pages from the book of kells you mention above:
for i in $(seq -f '%03g' 1 339); do
for side in r v; do
./dezoomify-rs --max-width 10000 "https://digitalcollections.tcd.ie/content/14/pages/MS58_$i$side/image.dzi" "page$i$side.jpg";
done;
done
If you are under windows, you can use WSL and the linux version of dezoomify-rs to run it.
Wow. I had to erase my next question I was writing. Thank you!
I am currently running it (it's at page 28). I'll post the result here when it's ready.
extracted from the Trinity College Dublin website
These links will expire 30 days after they are last downloaded.
The Book of Kells contains the four Gospels in Latin based on the Vulgate text which St Jerome completed in 384AD, intermixed with readings from the earlier Old Latin translation. The Gospel texts are prefaced by other texts, including "canon tables", or concordances of Gospel passages common to two or more of the evangelists; summaries of the gospel narratives (Breves causae); and prefaces characterizing the evangelists (Argumenta). The book is written on vellum (prepared calfskin) in a bold and expert version of the script known as "insular majuscule". It contains 340 folios, now measuring approximately 330 x 255 mm; they were severely trimmed, and their edges gilded, in the course of rebinding in the 19th century.
Wow again! Thank you so much. I also got it to work - so you've helped me increased my capabilities. :) Much appreciated.
Are pdfs like the ones you linked basically exactly the same in resolution as the jpgs I downloaded with the script you wrote? I don't know how to tell the resolution within a pdf myself, so I'm just checking and curious to know (rather than suppose).
Yes, the PDF embeds the JPEG images without alteration. You can check that simply by extracting an image from the PDF.
Dear all, thank you for these comments and replies. I've attempted to run the bash command outlined by the developers in order to activate a batch dezoomify process, but I haven't been able to overcome some initial errors I'm getting. Here's what I have:
xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
xargs: illegal option -- d
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]]
[-J replstr] [-L number] [-n number [-x]] [-P maxprocs]
[-s size] [utility [argument ...]]
I am running macOS Big Sur, which defaults to zsh. I tried typing "bash" before the xargs command, but that doesn't work either. What am I doing wrong?
Thank you in advance for your assistance.
Dear all, thank you for these comments and replies. I've attempted to run the bash command outlined by the developers in order to activate a batch dezoomify process, but I haven't been able to overcome some initial errors I'm getting. Here's what I have:
- A .txt file containing URLS to about 800 zoomify tile .xml files, separated by new line characters (\n)
- The dezoomify-rs application, unpacked in a directory on my desktop
- In terminal, I change the directory to the directory on my desktop, which contains dezoomify and my .txt file of URLS
- Then I type xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
- I get the following error:
xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt xargs: illegal option -- d usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]] [-J replstr] [-L number] [-n number [-x]] [-P maxprocs] [-s size] [utility [argument ...]]
I am running macOS Big Sur, which defaults to zsh. I tried typing "bash" before the xargs command, but that doesn't work either. What am I doing wrong?
Thank you in advance for your assistance.
- CS
I am having the same problem. can you please help? [https://www.memoiredeshommes.sga.defense.gouv.fr/fr//_depot_mdh/_depot_images/LEVANT/SHDGR__GR_4_H_5/SHDGR__GR_4_H_5__001/SHDGR__GR_4_H_5__001_0001/ImageProperties.xml] I want to download from 1 to 170.
See my link above. You can either just remove -d '\n'
from the command (if none of your urls contains whitespaces) or install gnu xargs with brew
Yes, indeed. Thank you lovasoa. I used the following command and it worked brilliantly.
xargs -n 1 ./dezoomify-rs -l < ./urls.txt
The -l is the command for downloading the largest possible image available in the .xml zoomify file. I removed the -d '\n' variables which activated the batch commands.
Thank you.
p.s. I will point out, though, that there are many tile errors in the batch command system. Lots of the images that are coming down via this batch command xargs system are missing 5 or more tiles that Terminal says couldn't be downloaded. However, when done manually one by one these tiles download perfectly.
Maybe the server is having some troubles serving all the requests you are making in short succession. You can use the --parallelism, --retries, --retry-delay and --timeout options to control that, and be more gentle with the server. Type dezoomify-rs --help
to view the list of available options.
Also, in general, I would avoid using -l
, and use something like --max-width 65535
. 65,535 is the largest possible width of a jpeg image.
Thank you. I will look into this right now.
When I do this, I get the following error
ERROR Invalid header value: failed to parse header value
multiple times.
However, my URLs are formatted correctly in .txt, each with a new line.
https://storage.googleapis.com/X/img/dzi/img_1.dzi https://storage.googleapis.com/X/img/dzi/img_2.dzi
etc.
For some reason, it will correctly process the last URL in the .txt file.
Any assistance?
Many thanks!
@CavalloScuro Could I ask you an example of your urls.txt file? I don't know what I'm doing wrong but I clearly am messing it up at the stage of separating each url. The error I receive is "ERROR Invalid header value: failed to parse header value"
An example of the URLs I'm using. I have tried separating them with \n but I either don't understand how to use it or I don't know, something else. I appreciate the help.
https://imagenes.patrimonionacional.es/iiif/2/X-I-4%2F0002.jpg/info.json https://imagenes.patrimonionacional.es/iiif/2/X-I-4%2F0003.jpg/info.json https://imagenes.patrimonionacional.es/iiif/2/X-I-4%2F0001.jpg/info.json
I'm getting the same error that piccachilly mentioned back in 03/2022. It seems like this is a HTTP issue. Wondering what I'm doing wrong to trigger this error for all URLs except for the last one.
Mrs. @rigel71, Idk if You remember me. We met on GitHub firstly in this issue: we shared an interest in dezoomify ASMI's cadastral maps. Kindly, I would like to ask if I can contact You about those same maps: perhaps You also managed to collect the maps I'm missing, before the old High Quality Digital Library was closed. Is Your mailbox the same as it was at that time, please?
Hey @lovasoa! I hope you're doing well. I've got a bit of a task on my hands and could really use your expertise. I'm trying to download the entire oeuvre by artists from Art & Culture using dezoomify in batch mode. For example, I want to download all 347 images by VVG. (https://artsandculture.google.com/entity/vincent-van-gogh/m07_m2?hl=en) I tried using your tips about using xargs -d '\n' -n 1 ./dezoomify-rs < ./urls.txt
, but I couldn't manage to make it work. Could you lend a hand or provide some guidance on how to go about this with a code? Your assistance would be greatly appreciated!
Hello! I'd be happy to setup a small consultancy contract for that! If you are interested, let me know on contact@ophir.dev
Hi. Great work on these dezoomers! I was hoping to use the batch capabilities suggested by the dezoomify node application (https://github.com/lovasoa/dezoomify/tree/master/node-app). I'm new at this, and I just couldn't get it to work after many hours. Is there a way to run dezoomify-rs in some kind of batch way?
(And I'm trying to pull images from https://digitalcollections.tcd.ie/home/index.php?DRIS_ID=MS58_003v.)