lovasoa / dezoomify

Dezoomify is a web application to download zoomable images from museum websites, image galleries, and map viewers. Many different zoomable image technologies are supported.
https://dezoomify.ophir.dev
GNU General Public License v2.0
671 stars 75 forks source link

Downloading full-resolution images from historicpittsburgh.org #354

Closed Laowai75 closed 4 years ago

Laowai75 commented 4 years ago

Hello,

I hope you could help find a solution for the "History Pittsburgh" website. I am following all the protocols that I have learned from the FAQ page and from you but to no avail. I would highly appreciate a helping hand.

The site has a wealth of photos and documents on Pennsylvania History and could be relevant to a lot of researchers:

Example URLs:

Current error message

It mentions that no zoomable image can be found.

Thank you in advance!

lovasoa commented 4 years ago

Hello, It looks like they use a custom system. But the good news is: if you look carefully at the URL of an individual tile, you realize that the portion of the image represented by the tile is actually encoded in the URL.

For instance, let's have a look at https://historicpittsburgh.org/adore-djatoka/resolver?url_ver=Z39.88-2004&rft_id=http%3A%2F%2Fhistoricpittsburgh.org%2Fislandora%2Fobject%2Fpitt%253A886.15689.AP%2Fdatastream%2FJP2%2Fview%3Ftoken%3D&svc_id=info%3Alanl-repo%2Fsvc%2FgetRegion&svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajpeg2000&svc.region=1024%2C1280%2C256%2C256&svc.level=4

The interesting part is :svc.region=1024%2C1280%2C256%2C256. %2C is the URL encoding of ,, so the "region" is 1024,1280,256,256 (1024,1280 is the position and 256,256 is the size) . You can replace that by a region that will cover the whole image, such as 0,0,9999,9999 and you're done : https://historicpittsburgh.org/adore-djatoka/resolver?url_ver=Z39.88-2004&rft_id=http%3A%2F%2Fhistoricpittsburgh.org%2Fislandora%2Fobject%2Fpitt%253A886.15689.AP%2Fdatastream%2FJP2%2Fview%3Ftoken%3D&svc_id=info%3Alanl-repo%2Fsvc%2FgetRegion&svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajpeg2000&svc.region=0%2C0%2C9999%2C9999&svc.level=4

If you want a nice-looking URL, you can even directly access http://historicpittsburgh.org/islandora/object/pitt:886.15689.AP/datastream/JP2/view?token=&svc_id=info:lanl-repo/svc/getRegion&svc_val_fmt=info:ofi/fmt:kev:mtx:jpeg2000&svc.region=0,0,99999,99999&svc.level=4

Laowai75 commented 4 years ago

It works! Thank you so much, as always!