Closed elliottslaughter closed 6 months ago
It would be helpful if you could run some experiments with (a) the default latex template, and (b) xelatex instead of lualatex. This could help rule out some possible explanations.
Note commit 3c178690e307f6f2e43d64c341712b1bf609e7fc which came in after 3.1.11 was released. It may fix the issue you're having.
Indeed, I have confirmed that building Pandoc from source at commit 87533e2d04539cf27e58e287759912f897962170 (which is newer than the one you linked) fixes the problem.
Sorry, this actually isn't fixed. I'm not sure what I did, but somehow I messed up my test in https://github.com/jgm/pandoc/issues/9295#issuecomment-1873656296.
I've been digging in to figure out what's going on and I think I know what's happening now.
Here's a minimal reproducer (note this is two files):
I produced test.json
by running pandoc test.md -o test.json
and then hand-modifying the Header
to clear the identifier to ""
. This approximates something I'm doing in a filter in my custom build, where I'm building headers with nullAttr
.
Now here's where things get fascinating:
$ pandoc --standalone --toc test.md -o test.md.tex
$ pandoc --standalone --toc test.json -o test.json.tex
$ diff -u test.md.tex test.json.tex
--- test.md.tex 2024-01-05 22:24:18
+++ test.json.tex 2024-01-05 22:29:22
@@ -63,7 +63,7 @@
\setcounter{tocdepth}{3}
\tableofcontents
}
-\section{Test Chapter}\label{test-chapter}
+\section{Test Chapter}
Text.
The TeX source from my json
file has no label
on the chapter. Because the label is missing, pdflatex
doesn't generate a warning:
$ rm -f *.toc *.log *.aux && pdflatex test.md.tex &> out.log && grep 'Rerun' out.log
LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right.
$ rm -f *.toc *.log *.aux && pdflatex test.json.tex &> out.log && grep 'Rerun' out.log
Therefore, when Pandoc attempts to generate a PDF for my json
file, it doesn't see a warning, doesn't think it needs to rerun pdflatex
, and doesn't end up filling the TOC.
Looking at the output log, the only item I see that you could maybe usefully look for is:
No file test.json.toc.
Maybe Pandoc needs to additionally check this message to catch missing TOCs?
Otherwise it becomes a hard requirement that anything generating Pandoc ASTs must generate the corresponding identifiers, purely for the purpose of generating labels that will trigger warnings when running pdflatex
. That seems like an unintuitive requirement, and easy to get wrong.
Actually, I think the approach suggested in my last comment (looking for No file ...
) is going to be insufficient. The reason is that when you have a large file with many sections, the TOC can span multiple pages. In this case, it will be necessary to run pdflatex
a total of 3 times to get the correct pages numbers in the TOC. In this case, the second run of pdflatex
generates no warnings and no No file example.toc
. There is literally no way to detect this case from the log.
Here's a reproducer with a large TOC:
pandoc large.md -o large.json
and then hand-edited to remove Header
identifiers)As before I generate large.json.tex
via:
pandoc --standalone --toc large.json -o large.json.tex
Here's the output from the first three runs of pdflatex large.json.tex
:
You can see for yourself that the PDF is correct only in the 3rd run, and the log files provide no guidance as to how many runs we need.
Therefore, I think the only reliable solution is to require 3 passes of pdflatex
when a TOC is being requested. We can rely on warnings in non-TOC cases, but when a TOC is involved, I think we can't get around hard-coding the number of runs.
I guess another solution could be to diff the *.toc
files before and after each run of pdflatex
. If the toc
file does not change, presumably we do not need to rerun pdflatex
. That could potentially save 1 of the 3 runs required in a subset of situations (e.g., when book
is used, or article
with a TOC that happens to fit on 1 page). For some users with large documents that might be a significant speedup.
Thanks for your careful analysis!
I believe that the fix for #9255 is not quite working in Pandoc 3.1.11. When building with 3.1.10, everything works normally. When I upgrade to 3.1.11, the TOC is empty: I have a "Contents" page but there are no entries in the TOC itself.
Everything should be identical except for the Pandoc version, which I am flipping back and forth.
Other details, not sure what matters:
--pdf-engine=lualatex
. I wonder if different TeX engines have different output, and maybe that's tripping up the new parsing algorithm?My lualatex version is:
I'm not quite sure how to debug this, but if there are things I can do, let me know.