PDF viewer - Githubissues

hortongn commented 4 years ago

Notes

Accessibility of access files is a major concern - we should not implement a solution that does not provide access to the document text and structure
Accessible PDFs (PDF/UA) will be the dominant format for future, text-based digitization

Resources

https://wiki.lyrasis.org/display/samvera/Page+Turners+-+Current,+Future+and+Requirements

crowesn commented 3 years ago

https://github.com/internetarchive/bookreader

jamesvanmil commented 3 years ago

I've looked at the IA bookreader a bit and i don't think it's a good fit for as general-purpose reader.

It took a bit of digging to understand the payload package (warning, certificate error here: https://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/) and it looks like it uses a jpeg2000 image server and xml-based OCR data. This might be suitable for a future experimental project, but our workflow right now doesn't support this output (and we'd have to remediate prior content to work with this reader).

My gut tells me that a simple pdf viewer would be the right call here, especially if it supported progressive loading (e.g. so that if we have a very large file only a portion of it would be rendered at a time).

FYI our digitization workflow officially has a collection strategy now, including a specific path on output files:

To meet users at their point of need, we have adopted the PDF/UA format as our preferred output for public access. PDF/UA format is user-focused, flexible, portable, and aligns with our focus on document-based digitization. It uses widely available tools, does not depend on a platform-based infrastructure for dissemination or reuse, and meets the accessibility goals of the University of Cincinnati.

https://uclibs.github.io/digitization-workflow/collection-strategy/

jamesvanmil commented 3 years ago

(also this is not to say that the IA bookreader might not have it's uses - if we wanted to run a special project/exhibit focused on a specific collection, with provisions for this kind of infrastructure and outputs, i think it'd be a great solution)

uclibs / uc_drc

PDF viewer #61

Notes

Resources