htdebeer / pandocomatic

Automate the use of pandoc
https://heerdebeer.org/Software/markdown/pandocomatic/
GNU General Public License v3.0
158 stars 14 forks source link
pandoc paru ruby static-site-generator

Gem
Version

Pandocomatic—Automate the use of pandoc

Pandocomatic automates the use of pandoc. With pandocomatic you can express common patterns of using pandoc for generating your documents. Applied to a directory, pandocomatic acts as a static site generator. For example, this manual is generated with pandocomatic!

Pandocomatic is free software; pandocomatic is released under the GPLv3. You will find the source code of pandocomatic in its repository on Github.

Note. Pandocomatic is build on top of paru, which is a wrapper around pandoc.

Note. As I am a GNU/Linux user, I do not officially support other operating systems like Mac OSX or Windows. Fixes and patches for those operating systems are welcome.

See pandocomatic’s manual for an extensive description of pandocomatic.

Why pandocomatic?

I use pandoc a lot. I use it to write all my papers, notes, websites, reports, outlines, summaries, and books. Time and again I was invoking pandoc like:

pandoc --from markdown \
  --to html \
  --standalone \
  --csl apa.csl \
  --bibliography my-bib.bib \
  --mathjax \
  --output result.html \
  source.md

Sure, when I write about history, the CSL file and bibliography change. And I do not need the --mathjax option like I do when I am writing about mathematics education. Still, all these invocations are quite similar.

I already wrote the program do-pandoc.rb as part of a Ruby wrapper around pandoc, paru. Using do-pandoc.rb I can specify the options to pandoc in a metadata block in the source file itself. With do-pandoc.rb the invocation above is simplified to:

do-pandoc.rb source.md

It saves me from typing out the whole pandoc invocation each time I run pandoc on a source file. However, I have still to setup the same options to use in each document that I am writing, even though these options do not differ that much from document to document.

Pandocomatic is a tool to re-use these common configurations by specifying a so-called pandocomatic template in a YAML configuration file. For example, by placing the following file, pandocomatic.yaml, in pandoc’s data directory:

templates:
  education-research:
    preprocessors: []
    pandoc:
      from: markdown
      to: html
      standalone: true
      csl: 'apa.csl'
      toc: true
      bibliography: /path/to/bibliography.bib
      mathjax: true
    postprocessors: []

In this configuration file a single pandocomatic template is being defined: education-research. This template specifies that the source files it is applied to are not being preprocessed. Furthermore, the source files are converted with pandoc by invoking pandoc --from markdown --to html --standalone --csl apa.csl --toc --bibliography /path/to/bibliography.bib --mathjax. Finally, the template specifies that pandoc’s output is not being postprocessed.

I now can create a new document that uses this template by including the following metadata block in my source file, on_teaching_maths.md:

 ---
 title: On teaching mathematics
 author: Huub de Beer
 pandocomatic_:
   use-template: education-research
   pandoc:
     output: on_teaching_mathematics.html
 ...

 Here goes the contents of my new paper ...

To convert this file to on_teaching_mathematics.html I run pandocomatic:

pandocomatic -i on_teaching_maths.md

With just two extra lines in a metadata block I can tell pandocomatic what template to use when converting a file. You can also use multiple templates in a document, for example to convert a markdown file to both HTML and PDF. Adding file-specific pandoc options to the conversion process is as easy as adding a pandoc property with those options to the pandocomatic_ metadata property in the source file like I did with the output property in the example above.

Once I had written a number of related documents this way, it was a small step to enable pandocomatic to convert directories as well. Just like that, pandocomatic can be used as a static site generator!

Pandocomatic is free software; pandocomatic is released under the GPLv3. You find pandocomatic’s source code on github.

Installation

Pandocomatic is a Ruby program and can be installed through RubyGems as follows:

gem install pandocomatic

This will install pandocomatic and paru, a Ruby wrapper around pandoc. To use pandocomatic, you also need a working pandoc installation. See pandoc’s installation guide for more information about installing pandoc.

You can also build and install the latest version yourself by running the following commands:

cd /directory/you/downloaded/the/gem/to
docker image build --tag pandocomatic:dev .
docker container run --rm -it --volume $(pwd):/home/pandocomatic-user pandocomatic:dev bundle exec rake build
gem install pkg/pandocomatic-2.0.1.gem

You only have to do the second step one time. Once you’ve created a docker image, you can reuse it as is until Dockerfile changes.

Examples

Convert a single file

Convert hello.md to hello.html according to the configuration in pandocomatic.yaml:

pandocomatic --config pandocomatic.yaml -o hello.html -i hello.md

Convert a directory

Generate a static site using data directory assets, but only convert files that have been updated since the last time pandocomatic has been run:

pandocomatic --data-dir assets/ -o website/ -i source/ -m

Generating pandocomatic’s manual and README files

Generate the markdown files for pandocomatic’s manual and its github repository README:

git clone https://github.com/htdebeer/pandocomatic.git
cd documentation
pandocomatic -d data-dir -c config.yaml -i README.md -o ../README.md
pandocomatic -d data-dir -c config.yaml -i manual.md -o ../index.md

Be careful to not overwrite the input file with the output file! I would suggest using different names for both, or different directories. Looking more closely to the pandocomatic configuration file config.yaml, we see it contains one template, mddoc:

 templates:
   mddoc:
     pandoc:
       from: markdown
       to: markdown
       standalone: true
       filter: 
       - filters/insert_document.rb
       - filters/insert_code_block.rb
       - filters/remove_pandocomatic_metadata.rb
       - filters/insert_pandocomatic_version.rb
   indexdoc:
       extends: mddoc
       postprocessors: ['postprocessors/setup_for_website.rb']

The mddoc template tells pandocomatic to convert a markdown file to a standalone markdown file using three filters: insert_document.rb, insert_code_block.rb, and remove_pandocomatic_metadata.rb. The first two filters allow you to include another markdown file or to include a source code file (see the README listing below). The last filter removes the pandocomatic metadata block from the file so the settings in it do not interfere when, later on, manual.md is converted to HTML. These filters are located in the filters subdirectory in the specified data directory data-dir.

However, the mddoc template converts from and to pandoc’s markdown variant, which differs slightly from the markdown variant used by Github for README files. Luckily, pandoc does support writing Github’s markdown variant. There is no need to create and use a different template for generating the README, though, as you can override all template’s settings inside a pandocomatic block in a markdown file:

 ---
 pandocomatic_:
   use-template: mddoc
   pandoc:
     to: markdown_github
 ...

 # Pandocomatic—Automate the use of pandoc

 ::paru::insert introduction.md

 ## Why pandocomatic?

 ::paru::insert why_pandocomatic.md

 ## Licence

 ::paru::insert license.md

 ## Installation

 ::paru::insert install.md

 ## Examples

 ::paru::insert usage_examples.md

 ## More information

 See [pandocomatic's
 manual](https://heerdebeer.org/Software/markdown/pandocomatic/) for more
 extensive examples of using pandocomatic. Notably, the manual contains two
 typical use cases of pandocomatic:

 1.  [automating setting up and running pandoc for a series of related papers](https://heerdebeer.org/Software/markdown/pandocomatic/#automating-setting-up-and-running-pandoc-for-a-series-of-related-papers), and 
 2.  [using pandocomatic as a static site
      generator](https://heerdebeer.org/Software/markdown/pandocomatic/#using-pandocomatic-as-a-static-site-generator).

Here you see that the README uses the mddoc template and it overwrites the to property with markdown_github.

Similarly, in the input file manual.md, an extra filter is specified, ‘number_chapters_and_sections_and_figures.rb’, to number the chapters and sections in the manual, which is not needed for the README, by using the following pandocomatic metadata in the manual input file:

pandocomatic_:
  use-template: mddoc
  pandoc:
    filter: 
    - 'filters/number_chapters_and_sections_and_figures.rb'

Pandocomatic allows you to generalize common aspects of running pandoc while still offering the ability to be as specific as needed.

More information

See pandocomatic’s manual for more extensive examples of using pandocomatic.