Closed jheer closed 2 years ago
@jheer hiding all of the non-snapshot content on the full page seems like a reasonable approach, and the ability to avoid page reloads is nice.
Does this solve the issue with canvases etc in PDF? Or would we need some logic to preserve that
I've pushed some initial work in https://github.com/uwdata/living-papers-testbed/pull/17 although I'm thinking through some concerns about preserving element styles when hiding non-snapshot content (more details in the PR)
There are some lingering issues with snapshots of page elements. The
pdf
output is inconsistent withpng
/jpg
output. Each has strengths and weaknesses. Ideally we would get consistent output with all the strengths and none of the weaknesses...display: inline-block
to ensure the element sizing is driven by child content. It also re-styles margins to avoid undesirable clipping.We could consider an alternative approach that uses the same preparations for both vector and bitmap outputs. We would want to load the page and capture the "live" page state. One idea is to inject JS code into the loaded page to change styles, hide non-snapshot content (e.g.,
display: none
), and perform sizing / margin adjustments. We could then take a (bounding box cropped) PDF or bitmap screenshot. It would be ideal to avoid re-loading the page for each snapshot, so we could look at ways to apply and then undo such styling transformations. Either way, as a subsequent optimization a conversion plan might also generate a filtered AST with only the elements we want to snapshot, thereby avoiding processing and rendering all the other page contents.@mathisonian Any reactions or other ideas?