jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.57k stars 3.38k forks source link

Markdown to PDF: "header-includes" content corrupted #9953

Open jwoithe opened 4 months ago

jwoithe commented 4 months ago

There seems to be a regression in pandoc 3.2.1 compared to 3.1.11.1 with the handling of header-includes when converting from markdown to PDF. Take a file (test.md) with the following Markdown content:

---
header-includes: |
    \makeatletter
    \let\old@verbatim@font=\verbatim@font
    \def\verbatim@font{%
    \fontsize{10}{12}%
    \old@verbatim@font
    }
    \makeatother
...

To convert this to PDF, I run the following command: pandoc test.md -o test.pdf If I use pandoc 3.1.11.1 this completes without any issue. However, version 3.2.1 flags an error:

Error producing PDF.
! LaTeX Error: Missing \begin{document}.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.46 v

Inspection of the tex output produced by the --verbose pandoc option shows what's happening. When pandoc 3.1.11.1 is used, the header-includes content appears in the output exactly as specified in the markdown file. However, when pandoc 3.2.1 is used the tex output becomes:

\makeatletter
\let\old@

verbatim@font=\verbatim@font \def\verbatim@font{%
\fontsize{10}{12}%
\old@verbatim@font
} \makeatother

The errant line break after \let\old@ is the cause of the LaTeX error. I also note that \makeatother is no longer on a line by itself.

Both pandoc versions in use are from binary tarballs downloaded from the pandoc project downloads page. They are running under Linux. LaTeX comes from TeXLive 2023.

Does this behaviour come about due to a regression, or has pandoc 3.2 introduced a change in the way header-includes must be specified?

jgm commented 4 months ago

pandoc -s -t native yields:

Pandoc
  Meta
    { unMeta =
        fromList
          [ ( "header-includes"
            , MetaBlocks
                [ RawBlock (Format "tex") "\\makeatletter\n\\let\\old@"
                , Para
                    [ Str "verbatim@font="
                    , RawInline (Format "tex") "\\verbatim"
                    , Cite
                        [ Citation
                            { citationId = "font"
                            , citationPrefix = []
                            , citationSuffix = []
                            , citationMode = AuthorInText
                            , citationNoteNum = 1
                            , citationHash = 0
                            }
                        ]
                        [ Str "@font" ]
                    , SoftBreak
                    , RawInline
                        (Format "tex")
                        "\\def\\verbatim@font{%\n\\fontsize{10}{12}%\n\\old@verbatim@font\n}"
                    , SoftBreak
                    , RawInline (Format "tex") "\\makeatother"
                    ]
                ]
            )
          ]
    }
  []

And now we can see what is happening.

It's not an intended change.

Note that you can work around this by enclosing the raw LaTeX in

```{=latex}
...

(see `raw_attribute` extension).
jgm commented 4 months ago

Possibly relevant item from pandoc 3.1.12.3 changelog:

  * LaTeX reader:

    + Improve tokenization of `@` (#9555). Make tokenization sensitive to
      `\makeatletter`/`\makeatother`. Previously we just always treated
      `@` as a letter.  This led to bad results, e.g. with the sequence `\@`.
      E.g., `a\@ b` would parse as "ab" and `a\@b` as "a".

Not that this really explains what is happening.

jwoithe commented 4 months ago

Thanks for the hint about the workaround which appears to work as you suggest:

---
header-includes: |
    ```{=latex}
    \makeatletter
    :
    \makeatother

...



If there is anything else I can do to help resolve the unintended change please let me know.