uwdata / living-papers

Authoring tools for scholarly communication. Create interactive web pages or formal research papers from markdown source.
BSD 3-Clause "New" or "Revised" License
128 stars 10 forks source link

Improve convert image snapshots #14

Closed jheer closed 2 years ago

jheer commented 2 years ago

There are some lingering issues with snapshots of page elements. The pdf output is inconsistent with png/jpg output. Each has strengths and weaknesses. Ideally we would get consistent output with all the strengths and none of the weaknesses...

We could consider an alternative approach that uses the same preparations for both vector and bitmap outputs. We would want to load the page and capture the "live" page state. One idea is to inject JS code into the loaded page to change styles, hide non-snapshot content (e.g., display: none), and perform sizing / margin adjustments. We could then take a (bounding box cropped) PDF or bitmap screenshot. It would be ideal to avoid re-loading the page for each snapshot, so we could look at ways to apply and then undo such styling transformations. Either way, as a subsequent optimization a conversion plan might also generate a filtered AST with only the elements we want to snapshot, thereby avoiding processing and rendering all the other page contents.

@mathisonian Any reactions or other ideas?

mathisonian commented 2 years ago

@jheer hiding all of the non-snapshot content on the full page seems like a reasonable approach, and the ability to avoid page reloads is nice.

Does this solve the issue with canvases etc in PDF? Or would we need some logic to preserve that

mathisonian commented 2 years ago

I've pushed some initial work in https://github.com/uwdata/living-papers-testbed/pull/17 although I'm thinking through some concerns about preserving element styles when hiding non-snapshot content (more details in the PR)