lovasoa / dezoomify-rs

Zoomable image downloader for Google Arts & Culture, Zoomify, IIIF, and others
https://dezoomify-rs.ophir.dev
GNU General Public License v3.0
731 stars 65 forks source link

Scrape tiles, without reassembly #26

Closed KempWatson closed 3 years ago

KempWatson commented 4 years ago

Hi @lovasoa,
It would be great to be able to simply extract the tiles images and meta-information files, without creating a JPEG or TIFF etc out of it? Thoughts? I'd give it a stab but I don't know Rust.

K. Watson

lovasoa commented 4 years ago

Since you are talking about rust, I guess this issue is actually about dezoomify-rs, not dezoomify. Indeed, it would be nice if tiles could be saved to disk. If implemented correctly, it would also allow interrupting and resuming a large download, using the tiles that were downloaded already. I'll leave this issue opened in case someone wants to implement it, it would be nice.

And if you want to learn rust, I can recommend the excellent rust book. Rust is a really nice language.

lovasoa commented 4 years ago

In the meantime, the format of deep zoom images is very simple, so you can create a script that does what you want in ten lines of python or bash.

KempWatson commented 4 years ago

Thanks! Actually, I was thinking about Google Arts & Culture tiles, I think there's a more to it than a small script. Thanks for leaving open!

K.

On Fri, Mar 6, 2020 at 1:25 PM Ophir LOJKINE notifications@github.com wrote:

In the meantime, the format of deep zoom images is very simple, so you can create a script that does what you want in ten lines of python or bash.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lovasoa/dezoomify-rs/issues/26?email_source=notifications&email_token=AC2OM5O4BTN6YV72HZMTRDLRGE5YHA5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCK5KY#issuecomment-595898027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5M4FULZYI4N2SJPOFTRGE5YHANCNFSM4LDFOGDA .

lovasoa commented 4 years ago

Ok, I'm updating the title and your original post, then

lovasoa commented 4 years ago

For google arts, you can get the tiles using gapdecoder

KempWatson commented 4 years ago

Sadly, gapdecoder only saves JPEGs as well, no tiles. JPEG, as with PNG and TIFF, all break down at 65,536 pixels and 2 or 4 GB, making them all completely unsuitable to zooming images of any respectable size, so scraping into these formats doesn't achieve much.

K.

On Fri, Mar 6, 2020 at 1:53 PM Ophir LOJKINE notifications@github.com wrote:

For google arts, you can get the tiles using gapdecoder https://github.com/gap-decoder/gapdecoder

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lovasoa/dezoomify-rs/issues/26?email_source=notifications&email_token=AC2OM5PJJDRCWIPQCF63SCTRGFBBJA5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCNULA#issuecomment-595909164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5OPNQ45HH7MSJTADVLRGFBBJANCNFSM4LDFOGDA .

MrChrisWin commented 4 years ago

Sadly, gapdecoder only saves JPEGs as well, no tiles. JPEG, as with PNG and TIFF, all break down at 65,536 pixels and 2 or 4 GB, making them all completely unsuitable to zooming images of any respectable size, so scraping into these formats doesn't achieve much. K. On Fri, Mar 6, 2020 at 1:53 PM Ophir LOJKINE @.***> wrote: For google arts, you can get the tiles using gapdecoder https://github.com/gap-decoder/gapdecoder — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AC2OM5PJJDRCWIPQCF63SCTRGFBBJA5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCNULA#issuecomment-595909164>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5OPNQ45HH7MSJTADVLRGFBBJANCNFSM4LDFOGDA .

Could it be possible that the size of the output is limited by the amount of RAM in the computer? The bigger size > 65,536 requires larger memory allocation. Who on here has a PC with 64GB or 128GB of RAM ? I only have 32GB.

KempWatson commented 4 years ago

RAM allocation should never be bigger than tile height x image width.... you could dezoom petabyte size images on an iPhone...

K.

On Sat, Mar 7, 2020 at 1:21 PM Chris Win notifications@github.com wrote:

Could i

Sadly, gapdecoder only saves JPEGs as well, no tiles. JPEG, as with PNG and TIFF, all break down at 65,536 pixels and 2 or 4 GB, making them all completely unsuitable to zooming images of any respectable size, so scraping into these formats doesn't achieve much. K. … <#m1006652878905499422> On Fri, Mar 6, 2020 at 1:53 PM Ophir LOJKINE @.***> wrote: For google arts, you can get the tiles using gapdecoder https://github.com/gap-decoder/gapdecoder — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26 https://github.com/lovasoa/dezoomify-rs/issues/26?email_source=notifications&email_token=AC2OM5PJJDRCWIPQCF63SCTRGFBBJA5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCNULA#issuecomment-595909164>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5OPNQ45HH7MSJTADVLRGFBBJANCNFSM4LDFOGDA .

Could it be possible that the size of the output is limited by the amount of RAM in the computer? The bigger size > 65,536 requires larger memory allocation. Who on here has a PC with 64GB or 128GB of RAM ? I only have 32GB.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lovasoa/dezoomify-rs/issues/26?email_source=notifications&email_token=AC2OM5PEOANEWIAISMNUNALRGKGD3A5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOEB7IQ#issuecomment-596123554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5KZXEGQLV2M2G2GV5TRGKGD3ANCNFSM4LDFOGDA .

MrChrisWin commented 4 years ago

May be I'm mistaken but when dezoomify-rs is ran, it downloads all the tiles into memory and once download is completed, it saves the file. If the aggregate tile size up to be bigger than your RAM, then wouldn't it crap out?

KempWatson commented 4 years ago

Yes, just confirmed with the author that that is how it currently works. I'm assuming that virtual RAM would help, as I'm scraping 14 GB images on a 2 GB RAM machine without issue.

On Sat, Mar 7, 2020 at 1:54 PM Chris Win notifications@github.com wrote:

May be I'm mistaken but when dezoomify-rs is ran, it downloads all the tiles into memory and once download is completed, it saves the file. If the aggregate tile size up to be bigger than your RAM, then wouldn't it crap out?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lovasoa/dezoomify-rs/issues/26?email_source=notifications&email_token=AC2OM5NPSSOAOXN5R64JSMLRGKJ6LA5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOECYQI#issuecomment-596126785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5LN4MWAPHK744J62IDRGKJ6LANCNFSM4LDFOGDA .

KempWatson commented 4 years ago

Oop, I lie... the scraper completes without memory issues, but the file is bad, per my file limit comments previously.

On Sat, Mar 7, 2020 at 2:02 PM Kemp Watson kemp@objectivepathology.com wrote:

Yes, just confirmed with the author that that is how it currently works. I'm assuming that virtual RAM would help, as I'm scraping 14 GB images on a 2 GB RAM machine without issue.

On Sat, Mar 7, 2020 at 1:54 PM Chris Win notifications@github.com wrote:

May be I'm mistaken but when dezoomify-rs is ran, it downloads all the tiles into memory and once download is completed, it saves the file. If the aggregate tile size up to be bigger than your RAM, then wouldn't it crap out?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lovasoa/dezoomify-rs/issues/26?email_source=notifications&email_token=AC2OM5NPSSOAOXN5R64JSMLRGKJ6LA5CNFSM4LDFOGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOECYQI#issuecomment-596126785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5LN4MWAPHK744J62IDRGKJ6LANCNFSM4LDFOGDA .

lovasoa commented 4 years ago

I just released version 2.3.0-beta, which contains a new experimental output format : IIIF. It allows downloading tiles without stitching them together.

MrChrisWin commented 4 years ago

I just released version 2.3.0-beta, which contains a new experimental output format : IIIF. It allows downloading tiles without stitching them together.

The download works smoothly. However, the viewer is very slow even when loading from RAMDisk. There is no image preview when starting so all the tiles have to be load before you can see entire image.

lovasoa commented 4 years ago

Yes. It currently only downloads the tiles from the level you chose, it doesn't create new ones for other levels.

lovasoa commented 4 years ago

I implemented a complete IIIF retiler in dezoomify-rs 2.3.0-beta3. The retiler recreates all zoom levels for the image you are downloading, finally making it tractable and even easy to download and view huge images that do not fit in memory. :tada:

lovasoa commented 3 years ago

I added a tile cache in the latest development version. Can you test it and give feedback before I release it ?

KempWatson commented 3 years ago

Hi Ophir... is this for Google Arts, or for anything?

Best, Kemp

On Thu, Jun 17, 2021 at 2:34 PM Ophir LOJKINE @.***> wrote:

I added a tile cache in the latest development version. Can you test it and give feedback before I release it ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lovasoa/dezoomify-rs/issues/26#issuecomment-863470508, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2OM5PFVPCE577HUBBDKZDTTI52LANCNFSM4LDFOGDA .

lovasoa commented 3 years ago

Hi Kemp ! The tile cache is built into the core of dezoomify-rs, so it works with all supported dezoomers 😎

lovasoa commented 3 years ago

You can download the latest build here: https://github.com/lovasoa/dezoomify-rs/releases/tag/build-efa91756b53c40dcd13ef6eb2ad0dc37f9e79c9c

KempWatson commented 3 years ago

;-)

W. Kemp Watson Objective Pathology Services Limited 13629 Fallbrook Trail Georgetown, Halton Hills, Ontario Canada L7G 4S8

http://www.objectivepathology.com

@.*** tel. +1 (416) 970-7284

On Jun 17, 2021, at 4:05 PM, Ophir LOJKINE @.***> wrote:

 You can download the latest build here: https://github.com/lovasoa/dezoomify-rs/releases/tag/build-efa91756b53c40dcd13ef6eb2ad0dc37f9e79c9c

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.