pandoc / lua-filters

A collection of lua filters for pandoc
MIT License
611 stars 166 forks source link

pagebreak filter does not work when converting latex to to docx #152

Closed joelnitta closed 3 years ago

joelnitta commented 3 years ago

I tried converting the sample md file using the pagebreak filter.

pandoc --to docx --lua-filter=pagebreak.lua -o sample.docx sample.md

The resulting docx file has no page break:

image

Tried with both pandoc 2.7.3 and 2.11.3.2.

tarleb commented 3 years ago

This is the expected output. Compare with the sample.md file and scroll down in the created docx: there is a second page (and a third).

Please reopen if there really isn't more content, then also include the software used to open the file.

joelnitta commented 3 years ago

D'oh! You're totally correct, sorry for not actually scrolling through the output. My bad.

I realized the problem is not converting markdown to docx but latex to docx:

Here is an example latex file sample_latex.tex:

\documentclass{article}
\begin{document}

this is the first page

\newpage

and this is the second page

\end{document}

When I run pandoc --to docx --lua-filter=pagebreak.lua -o sample_latex.docx sample_latex.tex,

this is the output in MS Word:

image

MS Word v 16.44 on Mac OSX 10.15.7
pandoc v 2.11.3.2

tarleb commented 3 years ago

Ah yes, that makes sense. Pandoc drops unknown TeX commands when reading LaTeX. You can still get the expected result by adding --from=latex+raw_tex.

joelnitta commented 3 years ago

Thanks! That fixed it.

If you don't mind another question, I'm still a little confused though... what is the difference between --from=latex and --from=latex+raw_tex? I don't understand why one would need to "extend" latex since \newpage is already standard latex.

jgm commented 3 years ago

Yes it's standard latex, but because \newpage doesn't correspond to anything in the pandoc AST, pandoc can ignore it or pass it through as raw tex. It only does the latter if you explicitly enable raw_tex, which isn't on by default for latex input.

joelnitta commented 3 years ago

I see, thanks for the explanation.