quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.96k stars 326 forks source link

Disabling latex-auto-mk produces corrupted pdf #9016

Open memeplex opened 8 months ago

memeplex commented 8 months ago

Bug description

No matter the engine configuration, when I disable latex-auto-mk I always get a pdf file that I cannot open in any viewer. Instead, when latex-auto-mk is enabled the pdf renders correctly.

Steps to reproduce

For example, for an input:

test.md

foobar

I have run quarto render test.md -t pdf with the following configurations:

_quarto.yaml

title: "foobar"
latex-auto-mk: false
_quarto.yaml

title: "foobar"
latex-auto-mk: false
pdf-engine: pdflatex
_quarto.yaml

title: "foobar"
latex-auto-mk: false
pdf-engine: xelatex
_quarto.yaml

title: "foobar"
latex-auto-mk: false
pdf-engine: latexmk
pdf-engine-opts:
  - '-auxdir=/tmp/quarto-aux'
  - '-emulate-aux-dir'

Expected behavior

A well-formed pdf is produced.

Actual behavior

This is what Preview shows for the resulting output:

image

Your environment

OS: macOS Sonoma 14.3.1

Quarto check output

Quarto 1.4.550
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.69.5: OK
      Deno version 1.37.2: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.4.550
      Path: /private/tmp/qenv/lib/python3.11/site-packages/quarto_cli/bin

[✓] Checking tools....................OK
      TinyTeX: (not installed)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: Installation From Path
      Path: /Library/TeX/texbin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.8
      Path: /private/tmp/qenv/bin/python3
      Jupyter: 5.7.1
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.3.3
      Path: /opt/homebrew/Cellar/r/4.3.3/lib/R
      LibPaths:
        - /Users/carlos/.rlibs/base
        - /Users/carlos/Documents/Util
        - /opt/homebrew/lib/R/4.3/site-library
        - /opt/homebrew/Cellar/r/4.3.3/lib/R/library
      knitr: 1.45
      rmarkdown: 2.25

[✓] Checking Knitr engine render......OK
cderv commented 8 months ago

I can reproduce this I believe with a simple document like this

---
title: test
format: 
  pdf:
    latex-auto-mk: false
---

foobar

I am surprised because I remember looking at previous issues with latex-auto-mk false and it was working... 🤔

memeplex commented 8 months ago

In fact, I experienced this a few weeks ago, but since I was playing with some latexmk options then, I assumed I'd bumped into some unsupported combination. Now I went back to those experiments and realized that the issue is more general. But perhaps it wasn't failing for such simple cases before, I can't tell for sure.

memeplex commented 8 months ago

FYI, one case in which I do remember this failed before was:

latex-auto-mk: false
pdf-engine: latexmk
pdf-engine-opts:
  - '-outdir=/tmp/quarto-aux'

This configuration is supported by pandoc, which parses --pdf-engine-opt and knows about the special case when outdir is set for the latexmk engine, so that the produced pdf is correctly moved from outdir to its final destination.

If you are looking into this issue maybe you'd want to check that case.

cderv commented 8 months ago

So the issue here is that when latex-auto-mk: false is passed, we do skip the Quarto processor and use generic one, which wil·let Pandoc does the PDF rendering directly

https://github.com/quarto-dev/quarto-cli/blob/0b11aa23efdba15cb534a74a872b91fd527e7637/src/command/render/output.ts#L82-L84

The default output recipe will have pandoc --to pdf output file with pdf extension, which means Pandoc produce a pdf file.

While debugging, I can confirm that Pandoc rendering to PDF works ok. File is not corrupted. However, Quarto don't stop there, and will run output post processor.

https://github.com/quarto-dev/quarto-cli/blob/0b11aa23efdba15cb534a74a872b91fd527e7637/src/command/render/render.ts#L253-L271

This will call the pdfLatexPostProcessor() configured for format: pdf https://github.com/quarto-dev/quarto-cli/blob/0b11aa23efdba15cb534a74a872b91fd527e7637/src/format/pdf/format-pdf.ts#L156-L159

And this processor will try to do a line processing designed for post processing the .tex file on the pdf output...

https://github.com/quarto-dev/quarto-cli/blob/0b11aa23efdba15cb534a74a872b91fd527e7637/src/format/pdf/format-pdf.ts#L384-L385

This is what corrupts the pdf obviously.

This post processor should not apply when latex-auto-mk: false is set probably, or when the output file is not .tex but .pdf more generally

Thanks a lot for the report @memeplex !

cderv commented 8 months ago

FWIW this is not working since Quarto 1.2 at least.

Fact is we really need to post process the .tex file to support some of the feature we have in Quarto for PDF output (which is much more than Pandoc alone support).

so for this to work we would need probably a two step process

This is not just bug fix and require some re-work and some extended testing probably.

memeplex commented 8 months ago

I see what you mean and understand the difficulties. I've been using pandoc for many years with custom scripts and, in practice, I always ended up splitting the compilation into pandoc + latex phases to gain more control.

memeplex commented 8 months ago

Perhaps you can make this less flexible. The main problem that I'm trying to solve here is that I find the quarto engine too slow for small documents. It takes about 4-5x more time than a bare pandoc+latexmk run. Of course there are additional filters involved, but it also runs more passes than latexmk (at least after latexmk's aux files have been produced). I've been experimenting with latex-min-runs and latex-clean but to no avail. I can obviously cap the number of passes with latex-max-runs but that's not the point, I want the right number of passes and not less. I don't know enough to understand the reasons for this behavior, maybe it's trying to do more things that require more passes than latexmk does, maybe it's safer in some regard, maybe latexmk outsmarts it in some cases. But regarding the underlying latex engine, the quarto engine is already flexible enough to use pdf/xe/lualatex, so if it had a "vanilla latexmk" mode I wouldn't mind if the entire latex-auto-mk: false path were removed altogether.

Note: when I say "the quarto engine" I mean this one, that is latex-auto-mk = true.

> rm -rf /tmp/aux
> time pandoc test.md --pdf-engine=latexmk --pdf-engine-opt=-outdir=/tmp/aux -o test.pdf 
real    0m0.787s
user    0m0.627s
sys     0m0.140s

> time pandoc test.md --pdf-engine=latexmk --pdf-engine-opt=-outdir=/tmp/aux -o test.pdf 
real    0m0.459s
user    0m0.358s
sys     0m0.081s
> time quarto render test.md -M pdf-engine:xelatex -t pdf
real    0m2.790s
user    0m2.670s
sys     0m0.293s

> time quarto render test.md -M pdf-engine:pdflatex -t pdf
real    0m1.980s
user    0m1.929s
sys     0m0.249s

> time quarto render test.md -M latex-clean:false -t pdf
real    0m2.778s
user    0m2.676s
sys     0m0.288s

> time quarto render test.md -M latex-clean:false -t pdf
real    0m2.766s
user    0m2.669s
sys     0m0.295s