msprev / panzer

pandoc + styles
BSD 3-Clause "New" or "Revised" License
159 stars 15 forks source link

Why execute filters with subprocess? #21

Closed fredcallaway closed 8 years ago

fredcallaway commented 8 years ago

Perhaps I'm missing something here. But it seems strange to me to have a python module which is using subprocess to execute other python files, passing information through stdout and stdin. Is there a reason you can't replace the if __name__ script with a main function and then just call each filters main function?

msprev commented 8 years ago

This is because a filter can be any pandoc filter. A filter can be written in any language. I just happened to write mine in Python. You can see examples of filters in other languages here: http://pandoc.org/scripting.html

panzer adds features to filters that pandoc does not have (filters can take multiple command line arguments, filters get passed some extra metadata). However, panzer supports all vanilla pandoc filters.

fredcallaway commented 8 years ago

Right, this makes sense. However, I still think this is an easy optimization. You say that subprocess is a main reason for the execution time, and it seems that many (if not most) filters are written in python. It would be fairly straightforward, I think, to identify python files and run them through python itself.

Thanks for this sweet tool by the way :+1:

msprev commented 8 years ago

Thank you very much for the nice comment!

I could see that it is possible to have 2 separate architectures to support filters: one via subprocess and one via Python function calls. I'm a bit reluctant to jump on this as the maintenance cost of 2 separate pathways in the code here would outweigh a relatively small performance benefit for filters.

I need to use subprocess a fair bit in the code -- to call out to pandoc (on 2 reading runs + 1 writing run), to run pre/postflight scripts, and to run filters. The calls that take the lion's share of the time are the subprocess calls out to the pandoc executable. To a first approximation, everything else in the code takes a trivial amount of time. Interestingly, that applies also to calls to Python subprocesses (I guess because the Python runtime is already up and going, so running them is fast). Optimisation regarding running Python subprocesses probably isn't going to yield much gain. What would really speed things up is to rewrite panzer in Haskell and import pandoc as a library. Then it would then be almost as fast as vanilla pandoc. I'd love to do this, but it's a big task.

fredcallaway commented 8 years ago

Ah, I understand. Thanks for the explanation.

Personally, I think the convenience of python is worth the 3-4 seconds per build.

Cheers, Fred

On Feb 29, 2016, at 3:57 AM, Mark Sprevak notifications@github.com wrote:

Thank you very much for the nice comment!

I could see that it is possible to have 2 separate architectures to support filters: one via subprocess and one via Python function calls. I'm a bit reluctant to jump on this as the maintenance cost of 2 separate pathways in the code here would outweigh a relatively small performance benefit for filters.

I need to use subprocess a fair bit in the code -- to call out to pandoc (on 2 reading runs + 1 writing run), to run pre/postflight scripts, and to run filters. The calls that take the lion's share of the time are the subprocess calls out to the pandoc executable. To a first approximation, everything else in the code takes a trivial amount of time. Interestingly, this includes the calls to Python subprocesses (I guess because the Python runtime is already up and going, so running them is fast). Optimisation regarding running Python subprocesses isn't going to yield much gain. What would really speed things up is to rewrite panzer in Haskell and use pandoc as a library. Then it would then be almost as fast as vanilla pandoc. I'd love to do this, but it's a big task.

— Reply to this email directly or view it on GitHub https://github.com/msprev/panzer/issues/21#issuecomment-190108721.

msprev commented 8 years ago

Thanks, me too. I still use vanilla pandoc for instant html previews of markdown documents (https://github.com/msprev/vim-xmark). But for anything else (including any document that involves non-trivial configuration settings), I just push it through panzer using one of my styles. The extra time isn't a big issue for me.