Closed sicikh closed 1 month ago
Hm. All of the pagination work is done by pdflatex. Pandoc contributes nothing but a single tex command to create the table of contents. So I'm not sure what to do to fix this.
I also don't have any PDF reader that handles page numbers this way, to test with.
I also don't have any PDF reader that handles page numbers this way, to test with.
You can try using this online PDF viewer, that can display "logical" page numbers.
Hm. All of the pagination work is done by pdflatex. Pandoc contributes nothing but a single tex command to create the table of contents. So I'm not sure what to do to fix this.
I also don't even suspect the cause of the problem — if so, then it's most likely a problem in the template and the LaTeX libraries used.
I tried using the pandoc 3.1.9 executable and the template used at that time (see commit c9fe8b8, default.latex file) — the displayed page numbers match the logical ones. To help in some way, I will attach the template files used during compilation and the results themselves below.
(github does not allow to upload .latex files, so I changed the extension to .txt)
Command for pandoc 3.1.9:
pandoc 3.1.9
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /home/bezkonca/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
./pandoc "--from=markdown" --template="./template_3.1.9.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside -o test_3.1.9.pdf test.md
Output:
Command for pandoc 3.5:
pandoc 3.5
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /home/bezkonca/.local/share/pandoc
Copyright (C) 2006-2024 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside -o test_3.5.pdf test.md
Output:
Tomorrow I will try to understand which specific version the regression occurred from — of course, the jump from 3.1.9 to 3.5 should have had some effect :)
So, everything is a little weirder. I tried to compile the document via pandoc 3.1.9, but using a template from pandoc 3.5. It took adding several files (since pandoc 3.1.9 does not see the imported templates), but there is no problem with page numbers mismatch. I attach the files below.
So the problem is definitely in pandoc, not in the templates or else. Why this is happening is not clear at all. I will find out tomorrow which version the regression occurred from.
Command:
./pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside -o test_3.1.9_with_3.5_template.pdf test.md
after-header-includes.txt common.txt fonts.txt hypersetup.txt passoptions.txt template_3.5.txt (note, that adding these files and recompiling example with pandoc 3.5 does not resolve the issue)
Output:
I will find out tomorrow which version the regression occurred from.
That would be most helpful. Also, try generating a standalone latex file (-t latex -s
) and compiling it separately. If you can reproduce the issue there, then a diff of the working latex file and the nonworking one would be extremely helpful.
I will find out tomorrow which version the regression occurred from.
That would be most helpful. Also, try generating a standalone latex file (
-t latex -s
) and compiling it separately. If you can reproduce the issue there, then a diff of the working latex file and the nonworking one would be extremely helpful.
I found the version of pandoc with which regression occurs — this is 3.1.11.
3.1.10 works correctly on templates from versions 3.5 and 3.1.9, but 3.1.11 on both templates produces the behavior I specified above. Pandoc versions higher than 3.1.11 also do not work. I can't say for every version, but I tested specifically 3.1.13, 3.2 and 3.3.
As you indicated, I will try to reproduce the problem by compiling the standalone LaTeX files.
Changelog for 3.1.11 has:
Text.Pandoc.PDF:
It's possible that this is the culprit. Perhaps this change in logic caused latex not to be run enough times? Can you use --verbose
when producing a PDF and check the output to see how many times pandoc calls pdflatex?
Oh, i found the cause of the issue!
Reproducible steps:
Create a test.md
file with the data specified in the issue.
Compile to LaTeX with the command:
./pandoc-3.1.10-linux-amd64/pandoc-3.1.10/bin/pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside -t latex -s -o test_3.1.10_with_3.5_template.latex test.md
Note, that the pandoc version — 3.1.10 or 3.1.11 — does not matter here. The contents of LaTeX files are identical between these versions. As does not matter template version — issue reproduces on both.
xelatex
:xelatex test_3.1.10_with_3.5_template.latex
On the first run TOC is not here — xelatex
asks us to run compilation again. This is normal. There is also no "logical" page numbers, only real ones.
Run xelatex
again. TOC appears, the document is normal at first glance, there is no warnings from xelatex
. But if you look closely at the logical page numbers, there is the problem that I mentioned at the very beginning of the issue.
Run xelatex
again. The problem dissapears — the displayed page numbers in PDF viewer is in match with the logical ones.
So I can conclude that starting from version 3.1.11 pandoc does not run 'xelatex` for the third time.
I hope I helped in some way. Is there anything else I can do to help with finding the cause of the issue?
Changelog for 3.1.11 has:
Text.Pandoc.PDF:
- Ensure that we find all the LaTeX warnings requiring a rerun (#9284). This should fix a regression from 3.1.9 that led to incorrect alignments in tables (and possibly other issues).
It's possible that this is the culprit. Perhaps this change in logic caused latex not to be run enough times? Can you use
--verbose
when producing a PDF and check the output to see how many times pandoc calls pdflatex?
I'm sorry, I didn't refresh the page to see your new message before I sent mine. I'll do it in a minute.
Changelog for 3.1.11 has:
Text.Pandoc.PDF:
- Ensure that we find all the LaTeX warnings requiring a rerun (#9284). This should fix a regression from 3.1.9 that led to incorrect alignments in tables (and possibly other issues).
It's possible that this is the culprit. Perhaps this change in logic caused latex not to be run enough times? Can you use
--verbose
when producing a PDF and check the output to see how many times pandoc calls pdflatex?
Here it is:
Pandoc 3.11.1 command:
./pandoc-3.1.11-linux-amd64/pandoc-3.1.11/bin/pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside --verbose -o test_3.1.11_with_3.5_template.pdf test.md &> 3.1.11.log
Output:
Pandoc 3.1.10 command:
./pandoc-3.1.10-linux-amd64/pandoc-3.1.10/bin/pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside --verbose -o test_3.1.10_with_3.5_template.pdf test.md &> 3.1.10.log
Output:
Changelog for 3.1.11 has: Text.Pandoc.PDF:
- Ensure that we find all the LaTeX warnings requiring a rerun (#9284). This should fix a regression from 3.1.9 that led to incorrect alignments in tables (and possibly other issues).
It's possible that this is the culprit. Perhaps this change in logic caused latex not to be run enough times? Can you use
--verbose
when producing a PDF and check the output to see how many times pandoc calls pdflatex?Here it is:
Pandoc 3.11.1 command:
./pandoc-3.1.11-linux-amd64/pandoc-3.1.11/bin/pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside --verbose -o test_3.1.11_with_3.5_template.pdf test.md &> 3.1.11.log
Output:
Long output Pandoc 3.1.10 command:
./pandoc-3.1.10-linux-amd64/pandoc-3.1.10/bin/pandoc "--from=markdown" --template="./template_3.5.latex" --pdf-engine "xelatex" --toc -V documentclass=scrbook -V has-frontmatter -V classoption=oneside --verbose -o test_3.1.10_with_3.5_template.pdf test.md &> 3.1.10.log
Output:
Long output
As far as I can see from the logs, pandoc 3.1.11 does not run xelatex
for the third time — because xelatex
does not issue any errors or warnings about the need to rerun, which is why the whole problem occurs... But pandoc 3.1.10 runs xelatex
three times, which is what is needed to display the logical pages correctly.
Okay, so we have gotten to the bottom of it -- thank you for your help.
The issue is that xelatex
doesn't issue any errors or warnings about the need to rerun, but actually one does need to rerun. This should be fixable. As a crude measure, we could simply set runs=3 whenever a table of contents is used.
Okay, so we have gotten to the bottom of it -- thank you for your help. The issue is that
xelatex
doesn't issue any errors or warnings about the need to rerun, but actually one does need to rerun. This should be fixable. As a crude measure, we could simply set runs=3 whenever a table of contents is used.
If the LaTeX file contains \frontmatter
, \mainmatter
or \backmatter
, which are often used to reset the page counter, then it is worth at least three runs. There may be a TOC in the LaTeX template, but there may not be a counter reset.
Moreover, there may not be a TOC, but a counter reset may be present. So I don't even know how to reduce the number of launches to the maximum so that such bugs don't occur at all...
I looked at the LaTeX compilation code (the runTeXProgram
function) and yes, indeed, a few changes need to be made. The only question is how to fix the bug at all and not run the excess compilation once again...
Okay, so we have gotten to the bottom of it -- thank you for your help. The issue is that
xelatex
doesn't issue any errors or warnings about the need to rerun, but actually one does need to rerun. This should be fixable. As a crude measure, we could simply set runs=3 whenever a table of contents is used.If the LaTeX file contains
\frontmatter
,\mainmatter
or\backmatter
, which are often used to reset the page counter, then it is worth at least three runs. There may be a TOC in the LaTeX template, but there may not be a counter reset.Moreover, there may not be a TOC, but a counter reset may be present. So I don't even know how to reduce the number of launches to the maximum so that such bugs don't occur at all...
I looked at the LaTeX compilation code (the
runTeXProgram
function) and yes, indeed, a few changes need to be made. The only question is how to fix the bug at all and not run the excess compilation once again...
As I see in #9295, the problem is quite similar. A solution is proposed (https://github.com/jgm/pandoc/issues/9295#issuecomment-1879590208) to verify the diff of the toc
files, but in the case I indicated, this does not help when the TOC is located on several pages.
The suggestion in issue before that (https://github.com/jgm/pandoc/issues/9295#issuecomment-1879587295) is to hardcode at least three compilation runs when using TOC (as indicated, errors will be generated in the absence of TOC, which will start the required number of compilations — maybe — I'm not very familiar with the LaTeX). Otherwise, it probably won't work out in any way...
Okay, so we have gotten to the bottom of it -- thank you for your help. The issue is that
xelatex
doesn't issue any errors or warnings about the need to rerun, but actually one does need to rerun. This should be fixable. As a crude measure, we could simply set runs=3 whenever a table of contents is used.If the LaTeX file contains
\frontmatter
,\mainmatter
or\backmatter
, which are often used to reset the page counter, then it is worth at least three runs. There may be a TOC in the LaTeX template, but there may not be a counter reset.Moreover, there may not be a TOC, but a counter reset may be present. So I don't even know how to reduce the number of launches to the maximum so that such bugs don't occur at all...
I looked at the LaTeX compilation code (the
runTeXProgram
function) and yes, indeed, a few changes need to be made. The only question is how to fix the bug at all and not run the excess compilation once again...
My assumption was wrong, this only happens with the presence of \tableofcontents
.
You can return the check that was before 3.1.11 (53fbce09bee57f60754673485e513d42e2808589):
(3ff206cc4be8642e6c7f02c873707f8c5185321c):
tex2pdf program args tmpDir source = do
let numruns | takeBaseName program == "latexmk" = 1
| "\\tableofcontents" `T.isInfixOf` source = 3 -- to get page numbers
| otherwise = 2 -- 1 run won't give you PDF bookmarks
So maybe we should revert https://github.com/jgm/pandoc/commit/2dd98b967b8615d4d67ec9c62d7a33d16012241b#diff-0f4c4a28aca69bc0cf6b381fb7412ab5a46a4ffd513d0fd86773956e14883e87R442 which is pointless if we just always do 3 runs when there is a table of contents?
Or should we keep this and always do one additional run after the toc has stabilized? (Are there ever cases where > 3 runs are needed?)
Thank you so much for your work! I will definitely test the fix as soon as the nightly build is available :)
Yep, with the latest nightly the issue is solved — on two test inputs and on one real document the behaviour is correct. Thank you! :heart:
Reproducible steps:
test.md
file with the following contents:Output:
Real page number / Document page number / Displayed page number in PDF viewer (TOC) 1 / i / i 2 / ii / 1 3 / iii / 2 (Mainmatter) 4 / 1 / 3 5 / 2 / 4 ...
Expected output:
Real page number / Document page number / Displayed page number in PDF viewer (TOC) 1 / i / i 2 / ii / ii 3 / iii / iii (Mainmatter) 4 / 1 / 1 5 / 2 / 2 ...
-V classoption=oneside
can be omitted from the compilation command — the issue will still be here, but in another form:Real page number / Document page number / Displayed page number in PDF viewer (TOC) 1 / i / i 2 / ii / ii 3 / iii / 1 (Blank page) 4 / _ / 2 (Mainmatter) 5 / 1 / 3 ...
Expected output:
Real page number / Document page number / Displayed page number in PDF viewer (TOC) 1 / i / i 2 / ii / ii 3 / iii / iii (Blank page) 4 / _ / (not sure what should be here, but it is not in the scope of the issue) (Mainmatter) 5 / 1 / 1 ...
Pandoc version:
OS:
Manjaro Linux
Regression:
This was not an issue in Pandoc 3.1.9 — the document page numbers and displayed ones were in match.
Comments:
When upgrading from 3.1.9 to 3.5, I discovered this problem and at first thought it was an error in the Eisvogel template — but after a day of debugging, it turned out that the TOC seemed to consist of one page for Pandoc — otherwise I do not know how to interpret the resulting document. Later, it was possible to reproduce the problem on a standard template.