quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.91k stars 322 forks source link

[FR] Rethink the way to output PDF from any format. #7039

Open cderv opened 1 year ago

cderv commented 1 year ago

Pandoc does have a simple way to print PDF (https://pandoc.org/MANUAL.html#creating-a-pdf)

To produce a PDF, specify an output file with a .pdf extension:

--pdf-engine controls the behavior. Default is to look at the output format, a specific method will be used.

Quarto does not really work that way - this currently limit the way to create a PDF, and this is not really consistent across format.

In Quarto we support format: pdf which assumes LaTeX or ConTeXt enine. And format: typst which will produce PDF file by default.

There is special output-ext that can modify PDF render. output-ext: tex when format: pdf renders a .tex file. And output-ext: typ when format: typst render a .typ file. But this variable output-ext is fragile (try setting output-ext to anything it will not error)

When typst was introduced, format: typst has been created but default to render a PDF file. output-ext can be use to get the .typ.

HTML printing to PDF is not really supported in Quarto (https://github.com/quarto-dev/quarto-cli/issues/222).

Using format: latex does not exactly work the same as format: typst as it won't produce PDF.

We should probably rethink all this, and offer more mechanism to create PDF according to method available like Pandoc allows

Related Issues / Discussions

castedo commented 6 months ago

I have found it useful to have two types of generated HTML when creating dual-format webpage-and-PDF documents: 1) "fast print preview" HTML (i.e. #9505) vs 2) live end-reader HTML.

The "fast print preview" HTML is given to WeasyPrint and only viewed in a full web browser during authoring for the benefit of the author. The live end-reader HTML is posted online and NOT given to WeasyPrint even though it has the same author's main content as the fast print preview.

I'm not sure how this best fits in with future Quarto, but I can say that I find myself running pandoc in three different modes: A) generate the live end-reader HTML B) generate fast print review HTML, but skip the PDF C) generate the PDF (using the same HTML as the fast print preview)

cderv commented 6 months ago

Thanks for your feedback @castedo !

tvroylandt commented 3 months ago

Hi @cderv,

Here is what I have to do to make HTML to PDF engines (weasyprint, pagedjs-cli) working :

Everything can be found here https://github.com/kantiles/quarto.report/blob/main/template.qmd and here https://github.com/kantiles/quarto.report/blob/main/_extensions/quarto.report/_extension.yml. There is also a Python based template

As we talked, I agree that just generating an HTML with Quarto (and removing default style) then post-processing would be a better option

joelostblom commented 2 weeks ago

I came across this issue looking for something similar to the webpdf option in JupyterLab, which essentially converts to HTML and then prints that page using playwright. I saw that playwright and similar options were discussed in the context of revealjs output in https://github.com/quarto-dev/quarto-cli/issues/4677 and wonder if you would consider also adding it as an output format for other document types such as notebooks.

An advantage of using such "html-printing" methods is that they work well with web-based plotting packages. Currently, workarounds such as alternations to the notebook rendering of charts is required to export output from visualization packages such as altair, plotly, and bokeh. Trying the wkhtmltopdf options in pandoc (#222) does not seem to fix these issues as charts are still not shown (only tested with altair), so it would be convenient with a "webpdf-like" option that supports this natively and avoid running into issues such as https://github.com/quarto-dev/quarto-cli/issues/10571 and https://github.com/quarto-dev/quarto-cli/discussions/916

cderv commented 2 weeks ago

something similar to the webpdf option in JupyterLab

As I just learnt about this, putting reference here. This is a nbconvert feature relying on playwright https://nbconvert.readthedocs.io/en/latest/usage.html#convert-webpdf

Thanks for sharing @joelostblom !

I think in the "html to pdf" world there is two main options:

I agree with you that having a --to pdf using HTML as intermediate content would be a good addition to current options which are --to pdf using LaTeX to get the PDF, and --to typst using Typst to get the PDF.