scientist-softserv / britishlibrary

Other
3 stars 0 forks source link

Interim fix to stop IIIF print running for PDFs with many pages #541

Open grahamjevon opened 1 month ago

grahamjevon commented 1 month ago

Problem

If a work has many child works, the parent work page may not render. Any attempt to open the work will time out. Here is an example.

This is a problem for any work with a PDF with several hundred pages (e.g. a book) because the IIIF print process automatically creates a child work for each page. Thus the simple act of uploading a PDF file with hundreds of pages will render the work inaccessible and there is nothing the user can do to prevent this (other than not uploading the PDF file or manually deleting hundreds of child works, which is prohibitively time consuming).

Solution

A comprehensive fix that resolves the page rendering issue with all the child works remaining in situ requires investigation.

In the meantime, a quick fix is to create a script that decides whether or not the IIIF print process should run based on the number of pages in the PDF. For example:

The implication of not running the IIIF print process is that the PDF will not be accessible via the UV. It will only be available for download. But this is preferable to the page not rendering at all.

cziaarm commented 1 month ago

Limit currently set to 100

https://bl.bl-staging.notch8.cloud/concern/articles/767ab3ef-c4f4-491f-9d57-4444f7a88778?locale=en

This PDF has 552 pages... IiifPRint stays well clear.

grahamjevon commented 1 month ago

This works as expected when using both the UI and BX. PDF files uploaded with < 100 pages successfully go through the IIIF print process. PDF files >= 100 pages successfully skip the IIIF print process.