quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.74k stars 305 forks source link

Biblatex related issue caused by Quarto update (probably due to Pandoc update from 3.1.13 to 3.2) #9906

Closed juliantao closed 2 months ago

juliantao commented 3 months ago

Bug description

I use quarto to prepare my CV, I just reconfigured quarto based on the latest commit before rendering my CV. However, it throws errors complaining about missing files:

updating tlmgr

updating existing packages
finding package for numeric.dbx
finding package for biblatex-dm.cfg
finding package for JulianTaoCV.bbl
ERROR: 
compilation failed- no matching packages
Emergency stop.
<*> JulianTaoCV.tex

see /home/julian/Dropbox (ASU)/jtaocv/JulianTaoCV.log for more information.
ERROR: Error
    at renderFiles (file:///home/julian/quarto-cli/src/command/render/render-files.ts:
350:23)
    at eventLoopTick (ext:core/01_core.js:153:7)
    at async renderProject (file:///home/julian/quarto-cli/src/command/render/project.
ts:440:23)
    at async renderForPreview (file:///home/julian/quarto-cli/src/command/preview/prev
iew.ts:428:24)
    at async render (file:///home/julian/quarto-cli/src/command/preview/preview.ts:172
:22)
    at async preview (file:///home/julian/quarto-cli/src/command/preview/preview.ts:18
9:18)
    at async Command.actionHandler (file:///home/julian/quarto-cli/src/command/preview
/cmd.ts:421:7)
    at async Command.execute (file:///home/julian/quarto-cli/src/vendor/deno.land/x/cl
iffy@v1.0.0-rc.3/command/command.ts:1948:7)
    at async Command.parseCommand (file:///home/julian/quarto-cli/src/vendor/deno.land
/x/cliffy@v1.0.0-rc.3/command/command.ts:1780:14)
    at async quarto (file:///home/julian/quarto-cli/src/quarto.ts:159:5)

I tried to reinstall TinyTex, cleanup other versions of textlive, updating packages using tlmgr, but none of them worked.

I later reconfigured quarto based on an earlier commit when pandoc 3.1.13 is still used, the rendering was successful and I was able to generate the PDF as before.

Thus, I suspect that the error was due to the update of Pandoc.

Steps to reproduce

  1. Fork my CV repo, https://github.com/juliantao/jtaocv
  2. It is suggested to use the virtual environment based on the renv.lock file. This makes sure that necessary packages are installed and the correct versions of two important packages RefManageR and bibtex are used
  3. configure quarto based on the latest commit, by following the instructions here https://github.com/quarto-dev/quarto-cli
    git clone https://github.com/quarto-dev/quarto-cli
    cd quarto-cli
    ./configure.sh
  4. Go to the CV folder, render any of the qmd files, such as the JulianTaoCV_1page. The above errors will appear, and the rendering will be terminated immaturely.
  5. Use an earlier version of quarto, for example, 95579925, reconfigure quarto
    cd quarto-cli
    git checkout 95579925
    ./configure.sh
  6. then go to the CV folder again, render the same qmd file. This should result in successful rendering.

Expected behavior

No response

Actual behavior

No response

Your environment

No response

Quarto check output

Quarto 99.9.9
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.13: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 99.9.9
fatal: not a git repository (or any of the parent directories): .git
      Path: /home/julian/quarto-cli/package/dist/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.06
      Chromium: 869685

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/julian/.TinyTeX/bin/x86_64-linux
      Version: 2024

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.12
      Path: /usr/bin/python3
      Jupyter: 4.10.0
      Kernels: python3

(/) Checking Jupyter engine render....[IPKernelApp] ERROR | No such comm target registered: quarto_kernel_setup
[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.1.2
      Path: /usr/lib/R
      LibPaths:
        - /home/julian/R/x86_64-pc-linux-gnu-library/4.1
        - /usr/local/lib/R/site-library
        - /usr/lib/R/site-library
        - /usr/lib/R/library
      knitr: 1.43
      rmarkdown: 2.23

[✓] Checking Knitr engine render......OK
mcanouil commented 3 months ago

Could you provide all requested information and the command you ran?

Also what do you mean by "caused by updating Pandoc from 3.1.13 to 3.2"? Are you talking about the embedded version of Pandoc inside Quarto or a local version you have? Quarto should not use your system version of Pandoc, if it does it means your install is somehow corrupted.

Side note: do not use date: "`r format(Sys.Date(), '%B %Y')`" to set the date, see https://quarto.org/docs/reference/dates.html.

mcanouil commented 3 months ago

I can't run your repository because you are using old packages that require compilation. Using latest packages seems to break your workflow but possibly solves the LaTeX issue if it only appears with this repository.

image

Additionally, you are using Pandoc options Rmarkdown syntax, etc. I suggest:

juliantao commented 3 months ago

Could you provide all requested information and the command you ran?

Also what do you mean by "caused by updating Pandoc from 3.1.13 to 3.2"? Are you talking about the embedded version of Pandoc inside Quarto or a local version you have? Quarto should not use your system version of Pandoc, if it does it means your install is somehow corrupted.

Side note: do not use date: "`r format(Sys.Date(), '%B %Y')`" to set the date, see https://quarto.org/docs/reference/dates.html.

Hi, Thanks for the quick reply. I updated some info in the post, please check out.

I did not use my own Pandoc. I reconfigured quarto based on the git commits. Based on my description, I suspect that the error was due to a Pandoc update, which Quarto embraced.

juliantao commented 3 months ago

I can't run your repository because you are using old packages that require compilation. Using latest packages seems to break your workflow but possibly solves the LaTeX issue if it only appears with this repository.

image

Additionally, you are using Pandoc options Rmarkdown syntax, etc. I suggest:

I did some customization on the bib fields. Did you use the virtual environment of the directory? I previously noticed that there were issues with newer versions of RefManageR and bibtex R packages.

Thanks for the other tips as well!

mcanouil commented 3 months ago

You have no virtual environment, you only have a renv setup.

Note that the error you get is a LaTeX error: JulianTaoCV_1page.bbl is not found by the LaTeX engine. I suggest you investigate your tex file. You can add keep-tex: true.

edit: you should look for a "bib" file probably something like: \addbibresource{***.bib} added by your code.

See that it compiles without your code, so the issue is somewhere there:

image
juliantao commented 3 months ago

Thanks, @mcanouil. To provide you more background:

  1. My quarto CV project is quite complex, but I have used it for quite some time, it works on my side using an older version of the Quarto, as shown from the history of the repo. I tried my best to make it reproducible.
  2. I already spent several hours on this issue. I ruled out many factors and narrowed it down to Pandoc version.
  3. Regarding the source bib file, I was using a .tex template (https://github.com/juliantao/jtaocv/tree/main/_extensions/cv) where customization of the bib fields as well as the inclusion of my bib file was achieved.
  4. I checked the tex file and the corresponding log file, which did not give very useful information.
  5. "JulianTaoCV_1page.bbl is not found by the LaTeX engine.": exactly, this means that the source of error is the biblatex, biber or bibtex, which failed to generate the expected bbl file.
mcanouil commented 3 months ago

My quarto CV project is quite complex,

Indeed, which makes it hard for us to use, so you have to remove pieces of your CV until you get something small that exhibits the issue. You changed entirely the Pandoc template with yours, meaning you no longer account for any change for that.

For instance, you could try to use keep-md: true to get the result of your R code, without the actual R code. Then remove pieces of your markdown only document.

juliantao commented 3 months ago

Indeed, which makes it hard for us to use

I apologize for any inconvenience. I'm posting this here in the hope that it might help the developers identify a specific commit that caused an issue related to biblatex and/or pandoc. I understand this isn't the ideal way to report a problem.

I'll try to provide a minimal example, but it may take some time as I currently lack the bandwidth.

Thank you!

mcanouil commented 3 months ago

Ok, I did the debugging for you. Note that I basically remove piece by piece the content of your CV, nothing fancy, simply tedious as any debugging. The issue is your "baretable" function.

If you set:

source("preprocess.R")
baretable <- function(tbl, ...) knitr::kable(tbl)

This should be in your template since you have a custom one or in include-before-body.

\titlespacing{\section}{0pt}{1.ex}{0.5ex}
\thispagestyle{empty}
mcanouil commented 3 months ago

A bit more detail, the issue is that the R code produces:

\begin{longtable}{lllll}
  \textbf{PhD}  & Civil Engineering & Case Western Reserve University & Cleveland, US & 2013 \\ 
  \textbf{MS}  & Civil Engineering & Tongji University & Shanghai, China & 2009 \\ 
  \textbf{BS}  & Civil Engineering & China University of Geosciences & Wuhan, China & 2006 \\ 
  \end{longtable}

See the bad indentation for the closing? That's the issue.

This is what "your" code is producing:

image

Here, the issue is:

print(xtable::xtable(head(mtcars)), hline.after = NULL, tabular.environment = "longtable", booktabs = TRUE)
% latex table generated in R 4.4.0 by xtable 1.8-4 package
% Thu Jun  6 22:36:04 2024
\begin{longtable}{rrrrrrrrrrrr}
  & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\ 
 Mazda RX4 & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.62 & 16.46 & 0.00 & 1.00 & 4.00 & 4.00 \\ 
  Mazda RX4 Wag & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.88 & 17.02 & 0.00 & 1.00 & 4.00 & 4.00 \\ 
  Datsun 710 & 22.80 & 4.00 & 108.00 & 93.00 & 3.85 & 2.32 & 18.61 & 1.00 & 1.00 & 4.00 & 1.00 \\ 
  Hornet 4 Drive & 21.40 & 6.00 & 258.00 & 110.00 & 3.08 & 3.21 & 19.44 & 1.00 & 0.00 & 3.00 & 1.00 \\ 
  Hornet Sportabout & 18.70 & 8.00 & 360.00 & 175.00 & 3.15 & 3.44 & 17.02 & 0.00 & 0.00 & 3.00 & 2.00 \\ 
  Valiant & 18.10 & 6.00 & 225.00 & 105.00 & 2.76 & 3.46 & 20.22 & 1.00 & 0.00 & 3.00 & 1.00 \\ 
  \end{longtable}
Warning message:
In print.xtable(xtable::xtable(head(mtcars)), hline.after = NULL,  :
  Attempt to use "longtable" with floating = TRUE. Changing to FALSE.

Setting booktabs = FALSE solves the indentation issue that LaTeX does not like.

juliantao commented 3 months ago

Ok, I did the debugging for you. Note that I basically remove piece by piece the content of your CV, nothing fancy, simply tedious as any debugging. The issue is your "baretable" function.

Thanks for the debugging, @mcanouil . But I am afraid this is not relevant to the problem I reported. Did you successfully render the file after updating the baretable function?

mcanouil commented 3 months ago

Sorry to be blunt, but why would have detailed this otherwise?

image
juliantao commented 3 months ago

My bad! @mcanouil I originally would not render it following your suggestions. Now it worked after I remove the auxiliary files (.aux, .log, .blg) in the directory...

Now I need to figure out how to make the bottom rules of the tables invisible. After all, it was not due to biblatex..

Thank you, @mcanouil !

cscheid commented 3 months ago

I'm glad that got all sorted out - thanks for the deep assist, @mcanouil!

cderv commented 3 months ago

If I may, I think we should reopen because the indentation problem identify is not the real cause here of the LaTeX compilation error.

Let's simplify the example based on Michael finding

---
title: test
format: pdf
keep-tex: true
keep-md: true
---

```{r}
#| output: asis
print(xtable::xtable(head(mtcars)), hline.after = NULL, tabular.environment = "longtable", booktabs = TRUE)

So yes, setting `booktabs = TRUE` will create a table with a ending indentation 
````tex
\begin{longtable}{rrrrrrrrrrrr}
  & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\
 Mazda RX4 & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.62 & 16.46 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Mazda RX4 Wag & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.88 & 17.02 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Datsun 710 & 22.80 & 4.00 & 108.00 & 93.00 & 3.85 & 2.32 & 18.61 & 1.00 & 1.00 & 4.00 & 1.00 \\
  Hornet 4 Drive & 21.40 & 6.00 & 258.00 & 110.00 & 3.08 & 3.21 & 19.44 & 1.00 & 0.00 & 3.00 & 1.00 \\
  Hornet Sportabout & 18.70 & 8.00 & 360.00 & 175.00 & 3.15 & 3.44 & 17.02 & 0.00 & 0.00 & 3.00 & 2.00 \\
  Valiant & 18.10 & 6.00 & 225.00 & 105.00 & 2.76 & 3.46 & 20.22 & 1.00 & 0.00 & 3.00 & 1.00 \\
  \hline
  \end{longtable}

and setting booktabs = FALSE creates no indent

\begin{longtable}{rrrrrrrrrrrr}
  & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\
 Mazda RX4 & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.62 & 16.46 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Mazda RX4 Wag & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.88 & 17.02 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Datsun 710 & 22.80 & 4.00 & 108.00 & 93.00 & 3.85 & 2.32 & 18.61 & 1.00 & 1.00 & 4.00 & 1.00 \\
  Hornet 4 Drive & 21.40 & 6.00 & 258.00 & 110.00 & 3.08 & 3.21 & 19.44 & 1.00 & 0.00 & 3.00 & 1.00 \\
  Hornet Sportabout & 18.70 & 8.00 & 360.00 & 175.00 & 3.15 & 3.44 & 17.02 & 0.00 & 0.00 & 3.00 & 2.00 \\
  Valiant & 18.10 & 6.00 & 225.00 & 105.00 & 2.76 & 3.46 & 20.22 & 1.00 & 0.00 & 3.00 & 1.00 \\
  \hline
\end{longtable}

We can see that in the intermediate .md file. But this is what triggers the issue, not the problem.

LaTeX can correctly compile a document with the first format of table, even if we can say that it is not ideal.

By looking at the .tex file we can see what the problem really is

when there is spaces before \end{longtable} it seems Quarto will create a broken tex file because the whole table, and \end{document} is missing.

Reprex without R

---
title: test
format: pdf
keep-tex: true
keep-md: true
---

````{=latex}
\begin{longtable}{rrrrrrrrrrrr}
  & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\
 Mazda RX4 & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.62 & 16.46 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Mazda RX4 Wag & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.88 & 17.02 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Datsun 710 & 22.80 & 4.00 & 108.00 & 93.00 & 3.85 & 2.32 & 18.61 & 1.00 & 1.00 & 4.00 & 1.00 \\
  Hornet 4 Drive & 21.40 & 6.00 & 258.00 & 110.00 & 3.08 & 3.21 & 19.44 & 1.00 & 0.00 & 3.00 & 1.00 \\
  Hornet Sportabout & 18.70 & 8.00 & 360.00 & 175.00 & 3.15 & 3.44 & 17.02 & 0.00 & 0.00 & 3.00 & 2.00 \\
  Valiant & 18.10 & 6.00 & 225.00 & 105.00 & 2.76 & 3.46 & 20.22 & 1.00 & 0.00 & 3.00 & 1.00 \\
  \hline
  \end{longtable}

This is the last line of the intermediate tex file 

````bash
❯ tail -n 10 index.tex
  urlcolor={Blue},
  pdfcreator={LaTeX via pandoc}}

\title{test}
\author{}
\date{}

\begin{document}
\maketitle

The document does not have a end of file anymore.

This is probably due to our post processing of tex file, not taking into account spaces when detecting longtable

So

mcanouil commented 3 months ago

After discussing this with Christophe (who helped figure out the actual internal issue), I agree the indentation is what caused the issue but here the actual problem happens in Quarto which fails to properly detect and process the longtable.

Thanks again @cderv

Note that this is a regression compared to 1.4.555.

cderv commented 3 months ago

Problem happens at https://github.com/quarto-dev/quarto-cli/blob/df817aeb0245e94ba30a7f50875aadeead36fd9a/src/format/pdf/format-pdf.ts#L387-L395

If I debug, I can see this is the processing that remove the content in the tex file.

We just don't handle spaces in our regexes https://github.com/quarto-dev/quarto-cli/blob/df817aeb0245e94ba30a7f50875aadeead36fd9a/src/format/pdf/format-pdf.ts#L833-L837

and this probably messes up.

I don't know if we should support it, but it seems not right that we remove the end of the document, especially for a processing made for side note, where the document does not use them

cderv commented 3 months ago

@mcanouil can you share your bisect information, I am not sure what I found is the real problem.

mcanouil commented 3 months ago

A bisect lead to https://github.com/quarto-dev/quarto-cli/commit/c8bfc133e47ee4f1ff618311973e38b75a4c07ac where there were changes in some regexes in particular table and longtable.

table (ERROR)longtable (ERROR)tabular (SUCCESS)
````qmd --- title: "Quarto Playground" format: pdf keep-md: true keep-tex: true --- ```{=latex} \begin{table} cell1 & cell2 & cell3 \\ cell4 & cell5 & cell6 \\ cell7 & cell8 & cell9 \end{table} ``` ```` ````qmd --- title: "Quarto Playground" format: pdf keep-md: true keep-tex: true --- ```{=latex} \begin{longtable}{ c c c } cell1 & cell2 & cell3 \\ cell4 & cell5 & cell6 \\ cell7 & cell8 & cell9 \end{longtable} ``` ```` ````qmd --- title: "Quarto Playground" format: pdf keep-md: true keep-tex: true --- ```{=latex} \begin{tabular}{ c c c } cell1 & cell2 & cell3 \\ cell4 & cell5 & cell6 \\ cell7 & cell8 & cell9 \end{tabular} ``` ````
cscheid commented 3 months ago

Good catch. We can't revert the perf change I made because it triggers O(n^2) behavior in large strings, but we need to fix this.

cscheid commented 3 months ago

(In passing, I think a lesson here is that we shouldn't jump too early to ascribe a cause. Instead, whenever possible, we should start with a bisection.)

cderv commented 3 months ago

Thanks.

The difference you note with the table environment is just that we do not processing currently with tabular environment. But we do with table and longtable.

Anyhow, the issue is more complex as it is interaction of several things, and the bug was hidden from some time.

I am noticing that with v1.5.37 we are producing this intermediate.

\begin{longtable*}{rrrrrrrrrrrr}
  & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\
 Mazda RX4 & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.62 & 16.46 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Mazda RX4 Wag & 21.00 & 6.00 & 160.00 & 110.00 & 3.90 & 2.88 & 17.02 & 0.00 & 1.00 & 4.00 & 4.00 \\
  Datsun 710 & 22.80 & 4.00 & 108.00 & 93.00 & 3.85 & 2.32 & 18.61 & 1.00 & 1.00 & 4.00 & 1.00 \\
  Hornet 4 Drive & 21.40 & 6.00 & 258.00 & 110.00 & 3.08 & 3.21 & 19.44 & 1.00 & 0.00 & 3.00 & 1.00 \\
  Hornet Sportabout & 18.70 & 8.00 & 360.00 & 175.00 & 3.15 & 3.44 & 17.02 & 0.00 & 0.00 & 3.00 & 2.00 \\
  Valiant & 18.10 & 6.00 & 225.00 & 105.00 & 2.76 & 3.46 & 20.22 & 1.00 & 0.00 & 3.00 & 1.00 \\
  \hline
  \end{longtable*}

meaning we are replacing with longtable* the initial longtable environment which was done by this processing initially https://github.com/quarto-dev/quarto-cli/blob/b72bdf7c36091fbf1109f16d32cc9d1088294b06/src/resources/filters/quarto-post/latex.lua#L431-L448

But more recently, we needed to handle some regex changes in Lua, and this part was modified to https://github.com/quarto-dev/quarto-cli/blob/df817aeb0245e94ba30a7f50875aadeead36fd9a/src/resources/filters/quarto-post/latex.lua#L436-L450

Which means

Several issues here to handle IMO

cscheid commented 3 months ago

You're right that there are a combination of circumstances causing this bug. Nevertheless, we could have written our code better to avoid this problem in at least one case.

Specifically, you're correct that the latex postprocessor is "eating" the entire document. This is always a bug, either in the emitted .tex file or in our code. The postprocessor should never end its processing in the "inside-table" state. So, if that ever happens, Quarto should issue a warning that something is going wrong.

cderv commented 3 months ago

Specifically, you're correct that the latex postprocessor is "eating" the entire document. This is always a bug, either in the emitted .tex file or in our code. The postprocessor should never end its processing in the "inside-table" state. So, if that ever happens, Quarto should issue a warning that something is going wrong.

Yes exactly. I am really glad we have found this !

I have a fix for the breakage that will close this issue, and I suggest we handle this post processing in its own issue.

juliantao commented 3 months ago

(In passing, I think a lesson here is that we shouldn't jump too early to ascribe a cause. Instead, whenever possible, we should start with a bisection.)

A great lesson for myself too! I presumed that this is a problem with Pandoc or BibLaTex. @mcanouil showed great patience and debugged piece by piece, while @cderv insisted on finding the root cause. I am impressed again by the Quarto culture. Thank you all!

mcanouil commented 3 months ago

You are missing what happened behind the curtain in between 😏

mcanouil commented 3 months ago

Anyhow, thanks @juliantao for the report!