msprev / panzer

pandoc + styles
BSD 3-Clause "New" or "Revised" License
159 stars 15 forks source link

Lua filters don't receive pandoc's AST, and don't output anything #38

Closed josineto closed 6 years ago

josineto commented 6 years ago

Starting with panzer 1.4, lua filters are supported (many thanks to @msprev !!). So I tried some lua filters that I wrote myself, and none of them got expected behaviour. In fact, it seems lua filters are receiving JSON, when should receive AST instead (according to pandoc's doc on lua filters).

For instance, I have this lua filter (very first example on lua filters in pandoc's doc). It should turn Strong elements into SmallCaps elements:

function Strong(elem)
  return pandoc.SmallCaps(elem.c)
end

Running directly with pandoc, it works. Using panzer, nothing happens. Below is my styles.yaml (excerpt), which only puts that lua filter to run:

Base:
  all:
    metadata:
      numbersections: true
      lang: "pt-BR"
    commandline:
      standalone: true
    lua-filter:
      - run: myfilter.lua

The filter is correctly found by panzer, and apparently runs, but no errors on lua-filter are shown when running panzer. Even print('some debug') doesn't output anything to console. I tried to use JSON on the lua filter, but I couldn't realize how to do that. My first try was like code below, withou any results:

function test(key, value, format, meta)
  return pandoc.Str(value)
end

toJSONFilter(test)

Am I missing something?

msprev commented 6 years ago

panzer is not passing jsons to lua filters. The example you give works perfectly for me, so there must be some other issue. Try this simple example:

---
style: None
lua-filter:
    - run: myfilter.lua
...

**hello world**

This produces the expected output in small caps.

Note that print will not work as you suggest as panzer communicates with pandoc over stdin and stdout. These communication channels are 'taken' and cannot be polluted with print messages from filters (the same applies to json filters). If you want to print error messages, you should do it over stderr channels not stdout (via io.stderr:write("This should go to stderr\n")). The documentation on panzer describes how to send json messages over stderr to pretty print messages.

This is actually consistent with pandoc's philosophy. Any filters should assume that stdin and stdout are 'taken' as pandoc is designed to operate on these channels -- consuming data from stdin and producing data to stdout. Diagnostics should go to some other output channel (stderr).

msprev commented 6 years ago

If you want to pretty print, use the following expression in the filter to send a json message over stderr to panzer:

io.stderr:write('{"level": "INFO", "message": "Debug message from filter here"}\n')
josineto commented 6 years ago

Lua filters are running normal. I've cleaned up my styles.yaml till the most essential parts.

But there's a problem: in pandoc's lua filters, there's a FORMAT variable that stores the format of writer, say, html or docx, for instance. In panzer's lua filters, that variable always turns to json. So, I tried to read the panzer_reserved field on AST, but the JSON_MESSAGE is a string, not a deserialized json. With code below, JSON_MESSAGE is sent to stderr:

function Pandoc (doc)
  local jsonMessage = doc.meta['panzer_reserved']['json_message'][1].c[2]
  io.stderr:write('{"level": "INFO", "message": "' .. jsonMessage .. '"}\n')
end

And the beginning of JSON_MESSAGE as printed in stderr:

[{"metadata": {"lang": {"t": "MetaInlines", "c": [{"t": "Str", "c": "pt-BR"}(...)

By now I can't continue with testing, but I'll test dkjson to deserialize that json and get the writer format. If there's a simpler method, please let me know.

msprev commented 6 years ago

Thank you for this feedback. I've completely changed the implementation based on it as the FORMAT variable is important. I've pushed a new release now. Changes to note:

On the json message accessible to filters...

Currently, the json message panzer_reserved is stored in the AST as a CodeBlock element. Think of it this way: it is stored as if one typed the message as a codeblock inside the markdown document under a panzer_reserved metadata field, i.e. as if one typed the message out as a string and surrounded it by backticks. The motivation is that there is no easy way of storing arbitrary, non-markdown data inside the pandoc AST (this has been discussed a few times on the pandoc mailing list). The accepted solution is to store such non-markdown data as raw code blocks to prevent the markdown parser from messing it up.

You are right therefore that to access this json message you need to:

  1. extract the content ('c' value) of this CodeBlock as a string and then
  2. deserialise the string to recover the json datastructure.

It's pretty easy to do though and should just be ~2 lines of code.

Hope that this helps!

josineto commented 6 years ago

It's totally working: my lua filters are running really fast, and without any modifications from using directly with pandoc. Many many thanks, @msprev !!!

josineto commented 6 years ago

Now I'm refining my workflow, and planning to create a repository with it including some panzer configuration along with my filters. Thanks!

msprev commented 6 years ago

Wonderful! I’m really glad. Let me know if any issues crop up and it would be good to share workflows.