Open stroobandt opened 4 years ago
That did not work. The Pandoc manual happens to be plain wrong about this!
Why? What went wrong? -H
has previously worked for me (though I no longer use it).
Trial-and-error led me to figuring out that a latex-headers.yaml
file could be used as an input markdown file, in which case it would be parsed as markdown (not literal) text, but could be marked as latex ({=latex}
) using the raw_attribute
extension. I describe this more fully in these two posts on pandoc-discuss:
P.S., since pandoc 2.8 (2019-11-22), you can use --defaults
to help with the separation of format from content.
When reading the manual, one would think LaTeX header-includes would only affect latex output.
Yeah, I can see how you might be led to believe that... maybe we should add a section where generic variables are listed, in addition to the "Variables for HTML", "Variables for LaTeX", etc. which we already have...
so yeah, if you could explain what was actually "plain wrong" in the manual, we should quickly be able to fix it... pull requests are always welcome as well..
This goes back to a discussion on a Lua filter PR. While I agree that the manual could be more explicit on this, I strongly disagree with the "plain wrong"; I thought I had explained the underlying issues in the linked discussion.
Maybe you could explain in what way -H
didn't work? it should do exactly what is advertised, namely include literal content from the file in the document header. If you were expecting the content to be parsed as Markdown, then it would not work, but that is not what the documentation suggests, is it?
A few notes on things a newbie (such as myself) might find less than clear in the manual:
"parsed as literal string text" vs. "parsed as raw content" vs. "parsed as markdown" For someone who doesn't already know the significance of these differences, just stating how something will be parsed doesn't mean much. (Most people will understand that symbols like *text*, etc., won't work in the "parsed as a literal string", but not necessarily what it means for LaTeX/HTML code, especially when used in pandoc markdown which seems to understand LaTeX/HTML.) Same goes for, "string scalars in the YAML file will always be parsed as Markdown". There is another possible vector for confusion, which is that "literal" in "The pipe character (|) can be used to begin an indented block that will be interpreted literally" seems to mean something slightly difference from "literal" in "metadata values specified here are parsed as literal string text, not markdown".
"Raw content" In one part of the manual, it says: "Raw content to include in the document’s header may be specified using header-includes; however, it is important to mark up this content as raw code for a particular output format, using the raw_attribute extension), or it will be interpreted as markdown." There, the following example is provided:
header-includes:
- |
```{=latex}
However, in another part of the manual, the following example is provided:
header-includes: |
\RedeclareSectionCommand[
There are two differences: (1) The use of raw_attribute extension {=latex}
, and (2) The header-includes: |
vs. header-includes:\n-|
. It isn't clear whether and what significance these two differences have, particularly given the impression provided by the manual that "it is important to mark up this content as raw code".
In one format or in all formats? Initially, I used to believe that if I included some HTML mark-up in a markdown file, I could convert that into whichever format I wanted (such as PDF). But the manual clarifies that raw HTML only gets converted into a few formats, and not all. Similarly, for raw LaTeX, the manual states: "Inline LaTeX is ignored in output formats other than Markdown, LaTeX, Emacs Org mode, and ConTeXt." However, it is then something to be learnt that the raw LaTeX included in headers is included in all formats. (Note: It is clear for those who understand that "inline LaTeX" is different from LaTeX in headers. But people like me might not understand that unless it is pointed out in the section on headers.)
I hope that helps.
I think the underlying issue is that people don't really understand the conceptual difference between variables and metadata fields. This is a natural confusion, because variables get set automatically from metadata fields. header-includes
as a metadata field gets parsed as Markdown (before the like-named variable gets set), while header-includes
as a variable just gets passed through without any modification at all.
Another issue that trips people up is that metadata fields set in documents behave differently from metadata fields set on the command line or via defaults files. In the former case, they are parsed as Markdown (or whatever the document's format is); in the latter case, they are interpreted as plain text -- which is not the same as simply passing them through verbatim to the output, since the text may need escaping appropriate to the output format.
The model is thus somewhat complicated, and it's easy to get pretty far into using pandoc without understanding it.
First of all, my sincere apologies for the caused commotion. This was not my intention.
Here is a reconstruction of what happened:
@tarleb taught me how LaTeX macros in header-includes
in a YAML block at the beginning of a document also have an effect on MathJax formulas in html
output. That already came as a surprise to me.
In my eternal quest to separate format from content, I wanted to achieve the same with referencing an external .yaml
file in my makefile
. Hence, I consulted the manual.
Since there is no header-includes
entry in the table of contents, I performed a Ctrl+F search on this term. This yielded four hits:
header-includes
:Raw content to include in the document’s header may be specified using header-includes; however, it is important to mark up this content as raw code for a particular output format, using the raw_attribute extension), or it will be interpreted as markdown. For example: […]
Variables set automatically
Pandoc sets these variables automatically in response to options or document contents; users can also modify them. These vary depending on the output format, and include the following: […]
header-includes
contents specified by-H/--include-in-header
(may have multiple values)
Trying to include the LaTeX macros using -H
is what did not work for me in the same way as it did with the YAML at the beginning of the Markdown document.
Now, I have to admit I was very tired when reading that second hit initially. Reading it now and with the hindsight and knowledge of the comments posted above, I can kind of see that: either I made a logical error (which I probably did) or there is semantic overloading of the term header-includes
or both.
The variable header-includes
being set automatically by -H
and being user modifiable is definitely not the same as setting header-includes
through -H
. Nonetheless, the latter is what my tired mind was thinking at that moment.
Luckily, there was this external web article Boilerplating Pandoc for Academic Writing which helped me on my way, there where the Pandoc manual did not.
I sincerely do hope that this erratic mind path will contribute towards improving the manual on the subject of header-includes
.
After all, this is why I opened this issue.
I think the underlying issue is that people don't really understand the conceptual difference between variables and metadata fields.
agreed, I think the only way to communicate that somewhat understandably is with a table...
@mb21 Your proposal goes a long way in clarifying things and highlighting the differences between the concepts of --variable
and --metadata
. The filter section does a good job of better promoting Lua filters, as I previously complained about in this Lua filter issue #121.
A couple of remarks, though:
pandoc -t native
gives the impression of being some help function, similar to pandoc -h
or pandoc -D FORMAT
. pandoc -t native FILE
would work better here.However, there still remain a number of treacherous mind traps in the Pandoc manual:
header-includes
at the very endIn view of what has been discussed here, the word choice of these two subsubsections in the manual is really unfortunate and ads to the confusion. I would suggest renaming them "Bibliographic variables" and "Bibliographic blocks".
Furthermore, what is the actual definition of header-includes
doing there at the end of "Metadata blocks"? That when the preceding deals exclusively with bibliographic variables. This is calling for confusion.
Finally, a lesser issue is that the only information about --metadata
remains somewhat hidden in the Reader options subsection, whereas an entire subsection is devoted to Variables. Perhaps --metadata
also deserves a brief subsection, to put it on par with variables. That would help in underlining the differences between the two concepts.
Recently, I experienced the Pandoc manual to be severely lacking and especially confusing on the subject of
header-includes
.First of all, I was profoundly surprised to see that LaTeX macros defined in
header-includes:
(without any further{=latex}
specification) also affect MathJaX in HTML output. When reading the manual, one would think LaTeXheader-includes
would only affectlatex
output. In all, this is a useful feature, but is not as such documented.Inserting
header-includes:
inside a YAML metadata block inside the input document is easy enough. However, in my eternal quest to separate format from content, I wanted to achieve exactly the same using amakefile
and an external file. I was hoping-H FILE
would do that, as is suggested in the manual. That did not work. The Pandoc manual happens to be plain wrong about this!After spending more time than intended trying out many more things, I was lucky to eventually run into Boilerplating Pandoc for Academic Writing. This article explains how easy it is to load
header-includes
from an external file by letting it precede the input file.I also wrote my findings in this [TeX StackExchange answer(https://tex.stackexchange.com/a/566707/26348).