jgm / doctemplates

Pandoc-compatible templating system
BSD 3-Clause "New" or "Revised" License
51 stars 9 forks source link

Allow translation between formats for partials #17

Open jkrenzer opened 1 year ago

jkrenzer commented 1 year ago


I would like to propose to add the possibility to include templates written in a different format in a way that they are translated to another target format.

Use Case

In scientific oder technical documents it is often necessary to include a certain front- and back-matter including e.g. tables with legal information, change records and general preamble sections which partly come before and after the TOC or other templated sections. With partial translation, these parts could be once prepared in a format and then be included in templates for many different other formats, rendering the need to synchronize many different complete templates when changing e.g. front-matter styling obsolete. So there could be "one source of truth" to populate many different templates.

Current state

Currently we use chains of multiple pandoc conversation steps to prepare documents including this header information. This way we can start from a common markdown template for the front- and backmatter and still convert to any final format. But this introduces some problems, as pandoc lacks the knowledge of the final target format in earlier steps resulting in different hiccups with e.g. the TOC and even some markup-errors like intertwined tags in html (like <p></nav></p> which is invalid).

Proposed solution

Include an option or a pipe to render a partial from another format to a given target format, defaulting to the target format of the including template when none is given.

Thanks and have a good time!


jgm commented 1 year ago

Anything that needs to be converted (e.g. your front matter) should be in a source document, not a template. Templates are just for presenting already rendered content. It may be that I don't sufficiently understand your case, though. An example might help.

jkrenzer commented 1 year ago

Hi John!

I agree that templates are for rendering metadata and the enclosing form of the user-provided document. That is what we are doing, as all the data in the template is pulled from YAML. The rationale behind this proposal is, that we have to reproduce the same data structure + document faithfully in different formats. And this is where the possibility for a common partial which can be included by templates to different formats would be helpfuls.


I am working in academia and we have many projects wiht space agencies. Customary all documents have to present human readable meta-data in a given format, fixed by contracts, project descriptions, work-breakdown-structure etc.

Here an example of how the first pages of such a document, regardless of the output format, have to be structured:

Example document

We see: Basically nothing, that pandoc cannot currently do. Which is great! But currently there are basically two alleys we can walk to get to this result:

  1. Prepare one template for each format and very carefully keep them all in sync
    • Upsides:
      • Nothing has to change
      • Lean from a user perspective
      • Can be codified in a defaults file and thus very easily and safely operated by anybody on the project
    • Downsides:
      • Maintenance of document template multiplies by formats
      • Maintenance requires knowledge of all the formats and the representation of tables, texts, lists etc.
  2. Precompile the document with a common template to markdown, then compile the intermediate to the final format
    • Upsides:
      • Only knowledge of markdown required to maintain template
      • Impossible for different formats to drift apart
    • Downsides:
      • Complicated build-train requiring scripts or a build-system, introducing points of failure
      • Not feasable just with pandoc and defaults files
      • As pandoc lacks knowledge of the final format, structures (e.g. the TOC, footnotes...) generated in an earlier step might fail or work only partially

The latter is what we currently do to generate our documentation.

Where does the proposal fit in?

Basically the proposal is to solve most of the downsides of the second workflow by introducing the possibility to translate a partial. This way there still will be templates for each format, but they will only include format-specific information and will call the same partial at the approriate location to generate the front-matter. This way everything could be handled again with defaults files and a simple pandoc invocation.

Why not use a Lua filter for it or something else? Basically this would be a valid solution which I already considered and worked on some time. But my gut feeling is, that this kind of preparation of a document and it's metadata's representation is so close to what I feel pandoc's templates are essentially conceived for, that it seems to be the appropriate place to put this functionality on the long haul.

Alternative ideas

jgm commented 1 year ago

Downsides: Maintenance of document template multiplies by formats Maintenance requires knowledge of all the formats and the representation of tables, texts, lists etc.

I still don't get it. Your tables, lists, etc. should be stored in markdown documents or YAML metadata and converted by pandoc and then inserted into a template. You shouldn't need to hard-code them in the template itself: just place a variable where they need to go.

jkrenzer commented 1 year ago

Yes, the metadata is read from YAML, that is not the problem. It's more about having a generic common template partial which can be the same for different output formats.

I'll try to illustrate the usecase tomorrow to hopefully make it less prone to misunderstandings.

jkrenzer commented 1 year ago

Thanks for the patience. It was a busy week. I did a little sketch and hope that it will help clarify what the current situation for our usecase is, how we work around the limitation and what the goal of the proposal is. In case the SVG display in github is wonky or to small to be read comfortably (there's text in the file-symbols, yes 😉 ), I attached a PDf file.

Sketch of cuttent situation, a workaround and the proposal

Sketch as PDF

So currently, to avoid maintaining a front-matter template for every format we generate ( 1 in the sketch), we go route 2 and render our source documents with a template containing the front-matter data to markdown to bake this information in and after this we convert the new markdown document to all the formats we need. So we have only one place to maintain the header tables and preamble information. In our template we make massive use of template variables, for-loops, conditionals etc. to fill the tables and sections with values from YAML metadata or to manipulate rendering of certain parts of the front-matter.

Upside is also, as most people in my insitute only know markdown by now, almost everybody can adjust the front-matter and not only the "specialists" which know all the formats. And it is only one document that hast do be changed, not three or four.

jgm commented 1 year ago

You could do this quite easily, e.g., with a Makefile that builds frontmatter.x from frontmatter.markdown as part of the build process. Nobody would ever have to touch anything but frontmatter.markdown; the other files would just be temporary intermediaries, automatically generated. Maybe that's what you're doing? Isn't that a perfectly satisfactory solution? In what way would things be improved for you if pandoc behaved differently?

jkrenzer commented 1 year ago

Currently we are using gnumake or meson to automate the frontmatter generation and shell piping to connect results of different pandoc runs into the final documents. This works mostly.

What I noticed is, that - due to the first run being a markdown-markdown transformation (so in the merged document citreproc and pandoc-crossref can run on all parts) - pandoc does not know about the final format. So for example the table of contents, which is part of the frontmatter, is generated in markdown-format. When the final document is then translated to e.g. HTML the table the TOC is only a basic list and not a nav element with a list of elments with special attributes tailored to HTML representation.

We partly compensate these shortcomings by adding raw-blocks for certain formats, but this way still cannot influence generation of elements. So we are missing out on some of pandoc's intelligence when we cannot tell it the final format.

After all I thought, as the generation of a frontmatter is a common problem faced in many parts of science and industry, it would be handy if pandoc could ease this usecase by having a feature to interpret a partial. I know my way around shell-scripts, make and all this stuff, but the big majority of my collegues does not. This complexity makes it not easy to argue the case for finally getting rid of Word and sending-commented-docx-back-forth-by-email. Which is a second point I would dare say we would benefit from a more elegant solution than piped pandoc-runs.

jgm commented 1 year ago

All template handling, including partial rendering, is done by a library doctemplates that is a dependency of pandoc. Even if I wanted to do this (and I'm not sure I do, because simplicity is itself a kind of elegance), I couldn't do it without circular module dependencies. (doctemplates would have to import pandoc to create such a pipe.)

Of course, nothing stops you from using a shell command, including pandoc, to populate a template variable. For example,

pandoc --variable foo="`pandoc input.md -t html`" -s

Maybe that could help in your case.

jkrenzer commented 1 year ago

Of course, nothing stops you from using a shell command, including pandoc, to populate a template variable.

This seems to be a good new idea. I will have look. Thanks!