sul-dlss-deprecated / universalviewer

The Universal Viewer is a community-developed open source project on a mission to help you share your content with the world
http://universalviewer.io
Other
0 stars 1 forks source link

Add more controls to PDF viewer #16

Open ggeisler opened 6 years ago

ggeisler commented 6 years ago

This is probably not an easy enhancement, but it would be nice if, when viewing a PDF, there was a Contents section available as there is with a multi-image object.

As with a multi-image object, the Contents section would ideally have both Index and Thumbnail tabs. In this way it would be similar to the options available in Preview on OS X. It would enable the user to obtain an understanding of the document structure and scroll through the document more quickly.

tomcrane commented 6 years ago

Stu notes:

The Box embed widget also has PDF rendering so may offer an interesting alternative UI.

tomcrane commented 6 years ago

PDF implementation notes:

UV uses https://pdfobject.com/, which is really clean, lightweight and simple. It replaced pdf.js, which was problematic in the UV - it significantly bloats the codebase, requires inclusion of all its i18n files, etc. However, for the last year, the much lighter pdfobject suffers from this problem:

Note regarding Mozilla Firefox: (December 2016) There is a known issue with Mozilla Firefox, which prevents PDFObject from properly detecting PDF support. Mozilla has removed the application/PDF MIME type from Firefox, despite built-in support for rendering PDFs. They are currently debating whether to restore this feature. PDFObject will not be updated until Mozilla has reached a final solution. Sorry for the inconvenience. If you absolutely must embed PDFs in Firefox, for now there are two options: Use static markup (the handy generator can help you) or use your own copy of PDF.js and PDFObject's forcePDFJS option (this is a sledgehammer technique). Hopefully Mozilla will reverse their decision.

To add thumbnails and TOC, we'd need to do one of two things. They are quite radically different:

1) Rely on the manifest. Rather than a single canvas that the PDF is attached to, the manifest actually models each page as a canvas, so that each canvas can be given a thumbnail, we can use ranges for navigation, and so on. This means Stanford would need to process the PDF to generate the structure and thumbs server side. The PDF rendering would be attached to the sequence or manifest. Some clue in the manifest would tell the UV to render the PDF directly rather than try to present it as a sequence of canvases. This approach is probably a non-starter for you.

2) Use a pdf.js again, but for interrogating the PDF, determining its structure, making thumbnails with HTML5 Canvases (https://bl.ocks.org/palerdot/bf0c52d84aa046a6963c) - and potentially still using the native browser renderer via PDFObject (or falling back to pdf.js, the "sledgehammer" approach described above for Firefox).

I'm not sure about the index tab. Will have to see what we could get out of a PDF in the browser.

If we used pdf.js again we'd need to do some work to determine how to avoid the bloating and other problems that led to its removal in the first place.

edsilv commented 6 years ago

Demian Katz (Villanova) reminded me that we weren't actually using plain pdf.js - it required some hacks to make it work, which incurred maintenence overhead when updating to newer versions.