Write Pandoc filters in .NET, using strongly-typed data structures for the Pandoc AST.
Pandoc is a command-line program and Haskell library for converting documents from and to many different formats. Documents are translated from the input format to an AST (defined in the Text.Pandoc.Definition module), which is then used to create the output format.
Pandoc allows writing filters — programs that intercept the AST as JSON from standard input, modify the AST, and write it back out to standard output. Filters can be run using the pipe operator (|
on Linux, >
on Windows):
pandoc -s input.md -t json | my-filter | pandoc -s -f json -o output.html
or using the Pandoc --filter
command-line option:
pandoc -s input.md --filter my-filter -o output.html
Much of the JSON-serialized AST comes in the form of objects with a t
and c
property1:
{
"t": "Para",
"c": [
]
}
This corresponds to a Para
object with properties filled with the values at the c
property.
The library defines types and base classes for both levels:
Type level | Description | Namespace | Visitor base class |
---|---|---|---|
Raw | Objects with a t and c property |
PandocFilters.Raw |
RawVisitorBase |
Higher-level AST | e.g. Para type |
PandocFilters.Ast |
VisitorBase |
The library also includes two predefined visitors — DelegateVisitor
and RawDelegateVisitor
— which can be extended by adding delegates via the Add
method, instead of defining a new class (see below for sample).
1. All the types in pandoc-types except for the root Pandoc type and the Citation type.
PandocFilters
NuGet package.Add
methods.Filter.Run
.--filter
; or pipe the JSON output from Pandoc into your program, and pipe the outout back into Pandoc.Note that Filter.Run
takes an arbitrary number of visitors — you can create multiple visitors and pass them into Filter.Run
.
using System.Diagnostics;
using System.Linq;
using PandocFilters;
using PandocFilters.Types;
var visitor = new RemoveImagePositioning();
Filter.Run(visitor);
class RemoveImagePositioning : VisitorBase {
public override Image VisitImage(Image image) =>
image with {
Attr = image.Attr with {
KeyValuePairs =
img.Attr.KeyValuePairs
.Where(x => x.Item1 != "height" && x.Item1 != "width"))
.ToImmutableList()
}
};
}
Using the delegate visitor:
using System.Diagnostics;
using System.Linq;
using PandocFilters;
using PandocFilters.Types;
var visitor = new DelegateVisitor();
visitor.Add((Image image) => image with {
Attr = image.Attr with {
KeyValuePairs =
img.Attr.KeyValuePairs
.Where(x => x.Item1 != "height" && x.Item1 != "width"))
.ToImmutableList()
}
});
Filter.Run(visitor);
For a real-world usage example with multiple visitors (and the reason I wrote this in the first place), see DlrDocsProcessor.
with
keyword to clone/initialize the returned instance; otherwise you'll have to pass in all arguments to the constructor.