Open jgm opened 7 years ago
Thanks for the suggestion, I've added it to the list.
Suggestion: subtopics for filters: AST. Also see #3262.
Example:
Recently I knew one of the undocumented feature of the AST regarding RawInline/RawBlock. I try to make some sense out of it by doing the following tests
# Raw LaTeX
$ printf "%s\n\n" '\LaTeX' | pandoc -f markdown -t native
[Para [RawInline (Format "tex") "\\LaTeX"]]
$ printf "%s" '[Para [RawInline (Format "tex") "\\LaTeX"]]' | pandoc -t markdown -f native
\LaTeX
$ printf "%s" '[Para [RawInline (Format "latex") "\\LaTeX"]]' | pandoc -t markdown -f native
\LaTeX
$ printf "%s" '[Para [RawInline (Format "beamer") "\\LaTeX"]]' | pandoc -t markdown -f native
$ printf "%s" '[Para [RawInline (Format "beamer") "\\LaTeX"]]' | pandoc -t beamer -f native
\begin{frame}
\end{frame}
# Raw HTML
$ printf "%s" '[RawBlock (Format "html") "<html>"]' | pandoc -t markdown -f native
<html>
$ printf "%s" '[RawBlock (Format "html5") "<html>"]' | pandoc -t markdown -f native
$ printf "%s" '[RawBlock (Format "html5") "<html>"]' | pandoc -t html5 -f native
$ printf "%s" '[RawBlock (Format "markdown") "\\LaTeX"]' | pandoc -t markdown -f native
\LaTeX
$ printf "%s" '[RawBlock (Format "markdown") "\\LaTeX"]' | pandoc -t latex -f native
What we see here:
while pandoc -t tex
is invalid, (Format "tex")
is valid in AST and in fact is the default when raw LaTeX is used in markdown. (and "latex" is valid in both cases)
Format valid in pandoc cli can be invalid in AST's Format
, e.g. beamer, html5, albeit not causing any error but disappeared.
Definitely, documentation on pandoc's AST is something I've been longing for a long time.
Not for using filters, but to be able to create an application that generates a valid pandoc document in JSON AST format.
For example, I've created a feature request for an Asciidoctor to pandoc AST backend:
But haven't managed to find a document that lists all of pandoc's AST supported elements.
@tajmone per definition (pun intended), see https://github.com/jgm/pandoc-types/blob/master/Text/Pandoc/Definition.hs#L95
Re-opening until we resolve some of the TODOs, but we now have a start on this in doc/customizing-pandoc.md
.
But haven't managed to find a document that lists all of pandoc's AST supported elements.
When I was looking at the filter docs (related: https://github.com/jgm/pandoc/issues/8750), I start wondering where I'd find the AST elements, and I started looking for the reference. That led me to this issue.
I think I'm going to try to make it a hobby project to put together some AST docs. I can start by looking at the code in src/Text/Pandoc/Definition.hs
Isn't Text.Pandoc.Definition's haddock page a good canonical reference for the AST elements? I'm curious what more you think would be needed in the way of documentation.
One of the downsides of the Haddock page is that is contains a lot of info and can be overwhelming. E.g., I'd guess that it's not immediately obvious to a user unfamiliar with Haskell that the list of instances can be skipped at first reading, but that the constructors are important: instances take up half my screen when I load that page, while constructors are just two lines.
I'm curious what more you think would be needed in the way of documentation.
I totally agree with @tarleb on the current docs being overwhelming to non-Haskell users.
Ideally, I'd love to see that pandoc would include a JSON or (YAML) file with the full AST specification (node name, type, attributes, etc.). If pandoc could auto-generate this JSON/YAML file (either within the source repository, or directly from the pandoc executable binary) and then provide a CLI command to emit it (e.g. --print-ast-spec
) it would make life very easy, since it would be available without having to surf the web for that info.
The reason I think a JSON or YAML file would be better (i.e. rather than a markdown doc, etc.) is that while these formats are both is human-friendly enough to be consulted as they are, they can also be easily manipulated to create ad hoc documents by parsing them and rendering them in whatever format one prefers (e.g. via Mustache templates). And, with the JSON/YAML spec being included within the executable, it would be very simple to setup any pandoc-related project to simply update the AST reference documentation with each new pandoc release by parsing it and re-generating the document via automated scripts.
As an example of how this might work, the PML (Practical Markup Language) tool does this by exporting it's document tags as a JSON file via the export_meta_data
CLI option. The generated JSON file looks like this:
Here's an example project where I create different AST spec documents from the JSON files using Mustache templates to create markdown, AsciiDoc and plain-text file by manipulating the JSON info to create different documents by filtering specific keys and values:
https://github.com/tajmone/pml-playground/tree/main/mustache
this quickly allows me to always have updated spec docs on PML nodes/AST whenever PML is updated, in an automated way.
So, something along those line would work for the pandoc AST too (IMO), allowing end users to represent the final spec document whichever way they prefer, thanks to the JSON/YAML spec being always available (as a single document) via the pandoc binary itself.
If pandoc could auto-generate this JSON/YAML file (either within the source repository, or directly from the pandoc executable binary) and then provide a CLI command to emit it (e.g. --print-ast-spec) it would make life very easy, since it would be available without having to surf the web for that info.
+1 on autogenerating. What I want to avoid is a manually produced document that could get out of sync. [I guess it should be possible, because everything is a Generic and Typeable instance.]
Maybe for an "evergreen" strategy, it'd be possible to have:
Note: I typed this before I saw the previous two responses in this thread, but @tarleb and @tajmone covered a lot of what I was thinking too. Here's the experience of a someone with only rudimentary programming knowledge and zero Haskell.
Isn't Text.Pandoc.Definition's haddock page a good canonical reference for the AST elements? I'm curious what more you think would be needed in the way of documentation.
Oh, interesting. I actually had scanned that but somehow didn't associate what I was looking at with a document structure. I might have understood if I'd opened the Block
element, but the first element was Pandoc
, which was too abstruse for me. Maybe I expected a tree/JSON representation, maybe with a little diagram of nodes (like how MDN represents the DOM).
So, besides that it may start at too sharp a grade, I can't comment on that doc's usefulness since I haven't used it. :-) I'm going to experiment with using that as a reference to make some Lua filters and see how I do.
I can speak of my experience as a Pandoc user at the lower end of technical proficiency, though. Basically, I'm only interested in transformation at the highest level: I just want a nice set of examples, a reference that I could look at, and maybe some example "lorem ipsum" style docs that I could inspect with pandoc -t native
.
I don't know Haskell at all, so I can say that something like this
walk :: (Block -> Block) -> TableFoot -> TableFoot
wasn't understandable to me. I suppose the reference examples assume Haskell knowledge: considering that Pandoc filtering is polyglottal, should that be necessary? Maybe a complementary, language-agnostic doc would make it easier for a wider audience to understand how to manipulate document structures.
But, again, once I dig into those docs a bit, the reference will probably make more sense. I think Pandoc casts a pretty wide net (the Getting Started docs explains what pwd
is), so I just posted this to document the experience of a reader with "fresh eyes."
Topics: