lionel-rigoux / pandemic

This project has moved to pandemics.gitlab.io
https://pandemics.gitlab.io
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Support for preprocessors #7

Closed will-hart closed 5 years ago

will-hart commented 5 years ago

I use preprocessors to prepare my documents for publishing by performing tasks like merging documents and variable substitution. I've tried existing filters but for various reasons they don't work as well as preprocessors.

Is there a way to use preprocessors with pandemic?


I can think of a few approaches if this isn't currently possible:

  1. a build script in a custom scaffold
  2. an option in the recipe.<format>.json file which is used by the publish command to pipe the result through to pandoc.
lionel-rigoux commented 5 years ago

This is not currently possible, but is actually on the roadmap. I was however a bit reluctant to implement this function for two reasons:

I had in mind something very similar to what you describe, pre and post "hooks". I think it would be nice to allow the possibility to chain scripts, ie. to provide a list of paths in the recipe.json, and to pipe all the scripts in between the .md read and the pandoc call. The big question now is which parameters must be relayed to the hooks. I guess the file content is not enough, as the script might need to be context aware (recipe folder, source and target folders, compiling options). Maybe if you have a clear example in mind it would be nice to start from there. I'll try to have a go at it during the Xmas holidays...

will-hart commented 5 years ago

I can see the attraction of not over complicating things!

If you were going for a full build system would a system of node.js based hooks / plugins be suitable? This would remove the requirement to install anything additional, and the API would define what info the plugins get / when they can run. Plugins could be installed in the same way as scaffolds etc and included in the recipe config like filters are.

There are two other outside pandemic solutions which I think would achieve the same goal:

1) create a script / batch file installed as part of the scaffold which runs the preprocessors and saves the value to a temporary file used by pandemic. 2) contribute / build / publish the filters I need and use them in the "official" way :)


The most common use case for me is variable substitution, although it would be nice to run the CLI I use to generate multi-part figures automatically. My writing is relatively data intensive, and the data analysis is done in jupyter notebooks. These spit out SVG figures and a variable file (yaml / json / whatever). I currently use the mustache cli to replace variables placeholders in the markdown with the correct value and pipe that into pandoc. This means in my manuscript I can write something like

The result is positive ({{my_positive_result.value}}, p < {{my_positive_result.pvalue}})

and be certain the value will be "correct". The variable file might contain

{
  "my_positive_result": {
    "value": "$3.5\\pm0.3$",
    "pvalue": "0.05"
  }
}

Initially I tried pandoc-mustache for this, however it doesn't handle maths very well - I couldn't really get to the bottom of this, but I think its probably to do with the way pandoc is escaping text when building the AST. The filter is unmaintained and I'm not 100% certain it can be fixed in the way I want hence why I just used the CLI.

lionel-rigoux commented 5 years ago

I see different ways of including such scripts in pandemic.

I think the best solution would be to go for (a) first. Scripts that are also stand-alone CLI (like your mustache) could be included as a submodule in the recipe, ensuring clear code paternity attribution and easy maintenance. Each recipe has it's on version, avoiding conflicts or dependencies issues. If this project ever get some momentum and the same script is used over and over again in all recipes in a stable manner, then solution c) would be the best solution, but I don't see this coming anytime soon ;)

I wouldn't put the processor in the scaffold, as this goes against the principle of separation between content and compiler. In your example, the scaffold should however contain a result.json. You can then use this as a default when you call mustache via pandoc (no need to pass the result filename as an argument nor via the YAML header of the manuscript).


I really like your approach with the mustache CLI. This is something I was looking for, but I never found a satisfying solution. We could use this as a test case for the processing hooks. I will work on implementing the hook feature. I guess that the hard part will be to find a proper way to call mustache (or any other script) in a pipe with a clean argument passing. In particular, we need a generic way to make available to the script the path to the recipe, source, and target directories. I think that environment variables would be the most flexible solution.

For example, having the mustache-cli as a submodule in the recipe, the recipe.json would then look like (with more argument passing to mustache obviously):

{
  "pre-hook": [
    './moustache-cli/bin/cli.js --conf  \"$PANDOC_SOURCE_DIR/results.json\"'
  ]
}

The next step would to allow results.json to be defined in the YAML header, but I have no idea how to do that properly so far...

lionel-rigoux commented 5 years ago

I just published a new version that should be able to deal with some minimal preprocessing scripts (prehooks). A minimal example could be to have in the recipe.format.json:

{
 "prehooks": [
   "tail -r"
 ]
}

which should reverse the order of the lines of text. Each command is executed from the root of the recipe folder and is piped to the next hook and eventually to pandoc. There is no env variable yet, so the example in my previous answer does not work yet. However this should not be too complicated and will come with the next release in the upcoming days.

will-hart commented 5 years ago

Looks neat :) Unfortunately this line fails on Windows:

'cat' is not recognized as an internal or external command

From reading the code could it be replaced with fs.readSync or maybe something like shelljs?

lionel-rigoux commented 5 years ago

Good point... I just published a new, simpler version that does not rely on cat. I also pass on three variables to the environement of the preprocessors, PANDOC_RECIPE_PATH, PANDOC_SOURCE_PATH, and PANDOC_TARGET_PATH. I cannot test on Windows unfortunately, but this seems to work on my side so far.

will-hart commented 5 years ago

0.3.1 ran ok for me, however ... Windows specific issues again :)

Windows has different syntax for env vars so its impossible to write a cross-platform prehook that relies on environment vars. Usually I'd use cross-env but I'm not sure that this can be called via spawnSync. I'll have a play and see if I can come up with a Windows friendly approach.

Happy to test Windows builds :)

will-hart commented 5 years ago

Well I don't think this is a real solution but I cheekily got it working by replacing the processor line with

processor.replace(/\$PANDOC_SOURCE_PATH/g, env.PANDOC_SOURCE_PATH),
lionel-rigoux commented 5 years ago

Fair enough. I implemented your solution. It is not yet published (I am not sure I want to stick with this one), but you should be able to use it if you work with a linked package. It seems not so easy to use mustache directly. It does not accept pipes for the template (I mean, it's an easy hack but that's not the point. Using https://github.com/lionel-rigoux/mustache.js, mustache \"$PANDOC_SOURCE_PATH/results.json\" -works well). I guess that in the end, you will need to implement an ad hoc script that uses mustache, and ideally by the same occasion screen the yaml header for .json files to use (as done in pandoc-mustache). If you do so, env variables will be handled by the scripts, which should be easier in a cross platform manner.

By the way, I also fixed a stupid bug that made recipe with no explicit format to be compiled with the default recipe instead.

will-hart commented 5 years ago

I agree its not a very good solution... I've been looking at the source for cross-env and it's possible that their solution could be "borrowed"

Good pickup re the mustache CLI. In my current tests it accidentally works because nothing else is using the piped stdout.

"prehooks": [
  "figures build \"$PANDOC_SOURCE_PATH\\panels.json",
  "mustache \"$PANDOC_SOURCE_PATH\\vars.json\" \"$PANDOC_SOURCE_PATH\\manuscript.md\""
]
lionel-rigoux commented 5 years ago

I will check if there is a way to directly use cross-env whan calling the preprocessing scripts. Yeah, you should not call the source file directly in the hooks otherwise they can't be piped properly. The only solution I see is to implement an independent script, as mentioned before. I'll give it a try...

lionel-rigoux commented 5 years ago

Commit 7113b8dd174e9bdb322484c3a163058811f3ec7c should now allow to call preprocessing hooks with env variables using the $VAR syntax on windows platform.

lionel-rigoux commented 5 years ago

Ok, the last version on pandemic (v0.4.0) includes a plugin to apply mustache templating to the manuscript. You will need to specify the results files (the views) in the yaml front matter of the manuscript, and add "pandemic-moustache" as a prehook. See https://github.com/lionel-rigoux/pandemic-mustache for an example. Thorough beta testing would be more than welcome ;)

will-hart commented 5 years ago

Sorry, it's been a hectic few days so I haven't had a chance to look at this yet. It sounds fantastic! I'll run some Windows tests on this towards the end of the week.

will-hart commented 5 years ago

Ok, I've tested it and it works really well. I've also written my own hook - https://github.com/will-hart/pandemic-it-figures.

I think this can probably be closed now? 🥇

lionel-rigoux commented 5 years ago

Fantastic! I will have a look at your plugin, this looks really interesting. I think it is time to create a wiki, I'll make a page to list such contributions. It would be nice to kick off a little community... I also created a plugin to convert images: https://github.com/lionel-rigoux/pandemic-image. This only handles bitmap, but my goal is to have a universal converter to always get your figures in the format required for submission.