harvard-lil / h2o

H2O is a web app for creating and reading open educational resources, primarily in the legal field
https://opencasebook.org
GNU Affero General Public License v3.0
37 stars 30 forks source link

Add PDF generation for #1555 #1809

Closed lizadaly closed 2 years ago

lizadaly commented 2 years ago

Trigger PDF generation through a Playwright client pointed at an arbitrary URL.

The URL is expected to be the printable HTML endpoint with the new (to this PR) whole_book option, where all sections are rendered at once. For large books this endpoint is not ideal for humans to access, but is necessary to give PagedJS the ability to paginate the entire thing in one go.

This can be invoked as:

python main/pdf.py http://opencasebook.test:8000/casebooks/6105-annotations-tests/as-printable-html/all/ \ 
   annotations.pdf

A few related changes in this PR:

Includes a simple Playwright test that verifies that a PDF file (or something resembling it) was created.

TODO

Very long (1,000+ page) PDFs will often fail to render in limited-resource environments. I think the problem is with PagedJS itself failing to complete pagination rather than anything to do with PDFs or Playwright, but this needs more testing. (A book that renders fine when running directly from my laptop's virtualenv may fail to complete on the same computer inside the Docker container.)

There's currently no ability for the PDF generation routine to share the authentication credentials of the user, so only published books can be generated this way.

Footer sample

image

PDF sample

annotations.pdf