Open kromabiles opened 4 years ago
@kromabiles great. Following up here. A few questions about this:
Processing of this would need to actually happen during ingest (batch) or it would be just too slow... we need to test.
What is your largest PDF around there?
Secondly. I will also enable a Digital object with the same directly on play.archipelago.nyc so we can test performance and compare.
Thanks!
@DiegoPino Yes, extracting PDFs into TIFFs would be great. Our book collections don't have any page level metadata - all structured as single object description. :/
Our largest PDF is about 3GB and consists of 93 pages (yearbook).
Seeing Archipelago in action sounds exciting! :)
Excellent. I will start planning. Will probably borrow book module settings, but i feel i should go TIFF first and the compress to JP2 if needed. I just tested a JP2 generated by islandora (core) and it was 25 Mbytes in size, same TIFF was 10 Mbytes which was a little bit annoying!
@kromabiles sorry for the slowness, i have a solution! But requires some testing, planning. Give me the end of the week to enable in our sandbox and i give you credentials there. Will also copy your Templates and prepare a spreadsheet testcase, but even better if you have a few PDFs in a zip and a demo spreadsheet around
No worries! Thanks, Diego - files are too big to attach here, so I'll send them over to you via email.
Hello Diego,
As you know, our IR is currently exploring ways to use the IMI to ingest/create BookCModel objects from PDFs. Since the IMI is the main tool we rely on for ingesting content into Islandora, could we explore/test some possible options/solutions for a way to implement a simpler pdf to tiff image capability? Some of our current PDF objs that we'd like to ingest as books have 50+ pages, which would then need to be divided and converted from pdfs to tiffs. My brain hurts.
More than happy to bounce off ideas and do some testing with you. :)
Best, Katie