uwdata / living-papers

Authoring tools for scholarly communication. Create interactive web pages or formal research papers from markdown source.
BSD 3-Clause "New" or "Revised" License
128 stars 10 forks source link

Modifications to puppeteer builder #11

Closed jheer closed 2 years ago

jheer commented 2 years ago

~Outstanding issues (beyond this PR):~

jheer commented 2 years ago

@mathisonian This PR provides updates to the puppeteer branch, leading to an "almost working" state. See the remaining issues listed in the description above for outstanding wrinkles (in addition to the 2 second timeout hack).

jheer commented 2 years ago

There are a number of PDF output options that we are not yet utilizing, documented here: https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagepdfoptions

We might be able to use these to limit to one page and control sizing / fix clipping. For example, we could inspect the size of an element and then use that to set the output PDF page size.

jheer commented 2 years ago

I've pushed changes to PDF snapshot generation. When pulling HTML content out of its original context (parent elements, style rules, etc) the size of the contained elements can change. The result was inaccurate sizes provided as input to the PDF generation. Instead, the PDF snapshot logic now calculates the bounding box itself and uses that to size the generated output.

I also pushed changes so that the generated output is written to the correct directory depending on whether a latex pdf or latex source files are the desired output format.

mathisonian commented 2 years ago

This looks good to me! I will merge and address the timeout hack and proxy server absolute url issue in the other PR.

To handle the runtime 2 second hack, I think we'll need to be able to reference a handle to the Observable runtime on the page. Do you have a preference for an approach to doing that beyond attaching it onto the window in the output/html/index.js entryScript function? We could add it to a dictionary if concerned about supporting multiple runtimes on a page.