pandoc-extras / pandocpm

Manage the install/update/uninstall of packages
https://pandoc-extras.github.io/pandocpm/
Apache License 2.0
7 stars 0 forks source link

Auto-install, auto-filter, and dependencies #9

Open ickc opened 7 years ago

ickc commented 7 years ago

As we'll see below, "Auto-install, auto-filter, and dependencies" are closely related issues.

Split off from #2:

Relationship between Panzer, Pandocpm, Panflute, Pandocfilters

These are the potential dependencies:

From https://github.com/pandoc-extras/pandocpm/issues/2#issuecomment-262265957

  • pandocpm can host panzer style files in the same way it hosts filters and templates.
  • users of panzer could use pandocpm to ensure their filters are installed.
  • pandocpm makes no difference between panflute and pandocfilters et al, which is a plus.

auto-filter and auto-install

auto-filter: should it fall under panflute or pandocpm?

From https://github.com/pandoc-extras/pandocpm/issues/2#issuecomment-272129860:

An alternative approach of auto-filter would be, rather than having an auto-filter in panflute and panflute calling pandocpm, may be the auto-filter can be in pandocpm instead, where pandocpm lists panflute (and possibly pandocfilters) as dependency. Then the pandocpm as a filter can do everything under the hood:

pip install pandocpm add the filter names in the YAML of the markdown pandoc -F pandocpm ...

Edit: a way to circumvent the main function problem is to embed the name of the main/action function in the yaml formula.

Additional notes:

This way, pandocfilters will have a more equal ground to panflute in terms of auto-filter and auto-install, which will be easier for adoption.

sergiocorreia commented 7 years ago

As a user, if all of your filters are panflute, it means we can run them way faster, because behind the hood autofilter avoids loading from stdin and converting to json (and same for dumping). EG:

Standard workflow, slow

pandoc -F filter1.py -F filter2.py ...

  1. pandoc reads the document, creates an AST, dumps it to stdout as json
  2. filter1 reads stdin, converts the JSON into an Doc() object
  3. filter1 runs the action() function
  4. filter1 dumps the new Doc() into JSON in stdout
  5. filter2 reads..
  6. filter2 runs..
  7. filter3 dumps...

autofilter workflow, fast

pandoc -F autofilter ...

  1. pandoc reads the document, creates an AST, dumps it to stdout as json
  2. panflute filter1 reads stdin, converts the JSON into an Doc() object
  3. panflute calls main() in filter1, which just runs action
  4. panflute calls main() in filter2, which just runs action
  5. panflute dumps the new Doc() into JSON in stdout

This means that once you are running at least one filter, running more is fast even in large documents.

Now, this can't be replicated with pandocfilters because there is no Doc() object. Sure, you could do externall calls, but then it would do exactly the same as the initial pandoc call, with the only gain being a faster--to--type command (in which case you can just use panzer)

ickc commented 7 years ago

Oh, so there's no way to have a "panflute-style auto-filter" in the case of pandocfilters, even with the main function (or a lookup of the "main function" as mentioned in #7)?

I am not familiar with the pandocfilters design so what I'm going to say might not make sense: e.g. can't we collect all the functions that needed to pass to toJSONfilters and passing all functions as a list?

If there's no way to make the above work, there might still be an advantage of providing a shortcut for all filters. (this won't be as fine-grained as panzer's option.) e.g. filters: [filter1, filter2]. So pandocpm can still auto-install santinized filters for the end-users in the first run.

The current panflute-filters key can remains. As far as pandocpm is concerned, anything in panflute-filters will be auto-installed by pandocpm (when pandoc -F pandocpm is used), then it will pass these to panflute and let panflute do its magic. i.e. in this case, pandocpm and panflute can both be used as a filter. pandocpm as a filter will recognize both filters and panflute-filters and auto-install them. filters will be run by pandocpm, and panflute-filters will be passed to panflute. panflute as a filter only recognize panflute-filters, and will only run them, but not auto-install them.

ickc commented 7 years ago

Auto-install can potentially has a couple of problem:

I think auto-install should install simple packages only, and print out error message to direct the users to install complex packages.

sidenote: filter arguments

If so, this will give people more incentive to keep their filter "simple". One problem I'm personally facing is my pantable is growing in complexity. I asked around in https://groups.google.com/forum/m/#!topic/pandoc-discuss/LIAfgkZKUiE about filter arguments. And currently panzer already allow filter arguments. What's your view on this?

And in terms of panflute's autofilters, it seems filter-arg won't work very well. But such feature (auto-filter that offer speed up) seems too good to give up.

ickc commented 7 years ago

@jgm had considered deprecating pandocfilters to favor panflute (from pandoc-discuss), but when I asked him again he didn't reply. I assume that he will continue to maintain it. (probably because a lot of users are still using it. Porting has incredible friction by just looking at py2 to py3 transitions.)

I wonder it will be worth the effort to convince him to let me maintain pandocfilters, so that necessary changed can be made into it to support some kind of auto-filters better, perhaps even follow panflute's design to make pandocfilters also works as a filter. What do you think?