Closed sergiocorreia closed 7 years ago
A few useful filters should be more easily available:
Allow panflute to be run as a filter, where it calls the list of filters listed in the metadata.
Are there any interest in turning this repo to be a centralized panflute filters gallery? I'm building an extended version of panflute csv2table based on yours in ickc/pandoc-table-csv-test/panflute-csv2table.ipynb. I almost finished it (need to think about the exact metadata keys to use, cleanup, etc.) and am thinking about how to distribute it.
From pandoc-discuss we discussed the need of a centralized pandoc filters library, as well as being easy to install. I'm thinking may be we can start from panflute? So, say, everyone made pull-request of their scripts into panflute (with some minimum requirement, say, naming scheme, version numbering, etc.), and then they will be bundled with panflute, with the said metadata controls which filters are used. And all people need will be adding, say, --filter=panflute
in the pandoc arg.
I am considering porting my pandoc-amsthm in panflute too. And I need a variants of pandoc-includes (the one on panflute seems great). I considered writing haskell filters but it is a pain to make sure the colleagues can install it. pip
is much easier (because python is almost ubiquitous) but @jgm
specifically said his pandocfilters isn't a centralized repository. So you are my last hope to streamline the use of pandoc filters. No pressure though. 😄
I agree that having a centralized repository will be the best path forward. Would this fit an organizational structure better?
This would be a cool thing to have. Now, how would this work exactly?
About the role of panflute: maybe we can list the filters used as metadata, and then have panflute auto-install them from this repo?
I have been thinking about setting up a GitHub organization about pandoc. It would actually be nice to have pandoc/panflute etc. all fall under 1 umbrella organization. I didn't ask @jgm
but think that probably he wouldn't want to do that.
[Sidenote: About organization: sadly, pandoc has already been taken, by some guy that has 1 repo with no active development, and the contents are pirated Chinese fictions in pandoc markdown. And I already filed a complaint (that the content violates copyright and hence the GitHub terms and conditions), but GitHub refuses to take it down and requires the copyright owner to do so. However, while I "know" the copyright owners (best authors among Chinese fiction writer), they don't know me.]
Anyway, I suggest if an organization is setup, its name should be more generic, and allow the inclusion of projects other than panflute. This will becomes the "centralized gallery" I've been talking about. Possible names are
lapandoc
: a word play on LaTeX from TeX, but la
probably stands for Lamport
, the creator of LaTeX.pandocx
: x
stands for extra, but people might think pan-docx
rather than pan-doc-x
pandoc-extras
: boring but clearI agree, wherever that repo is (say if GitHub Organization is used), panflute being able to auto-install it behind the scene would be excellent. In the latest version of pandoc, it means just putting it in data-dir/filters
, which seems more secure. But in earlier version of pandoc, it means panflute need to either put those filters in the PATH, or export the path panflute is installing to PATH. Either way, it is insecure. I guess if this feature is implemented, we should say this is for pandoc >= 1.18 only.
I kind of did the CSV thing in ickc/pandoc-filters/pandoc-filters.csv for currently available filters (but far from finished).
I think the list of all (panflute-)filters should fall in the same repo that contains those filter (say, panflute-filters), for easier organization. We can ask whoever making the pull request to the repo also adds their entry in the list (with a link to the documentation perhaps).
However there can be another separate repo that contain references to filters not in panflute-filters. (may be just transfer mine to the organization).
I think if we could auto-generate a website gallery of it, it would be great for filter discovery. I have some vague ideas about it, but don't know what's the best way to do it. (gh-pages has more limitation but seamless to GitHub. Travis is needed for test anyway, but requires more setting to customize a website build. And then there will be a question on which one to use, jekyll, yst, makefile+pandoc, etc.)
Just to mention another bonus of having a centralized repo for panflute filters: the naming scheme for filters can be shorter. Currently, people called the filters like pandoc-includes
, pandoc-csv2tables
, pandoc-placetables
, pandoc-amsthm
, etc. because they are submitted to cabal
/pip
, etc. and the prepended pandoc
is for identification among the seas of packages. If the panflute filters fall in one repo, the prepended string won't be necessary, which allow a cleaner, shortner name.
I like your proposal, but my main concern is that complexity can explode. I think that there are several interlinked issues that we will benefit from treating separately:
pandoc-extras
org (something like pandoc-extras/panflute-filters
)About step 4, do you know how to use setup.py to include executable files? (I think it's called entry points). It would be cool if we allow panflute to be a filter, so if you do pandoc -F panflute ..
then panflute checks the metadata and download+runs the required filters.
I think it is something like
entry_points={
`console_scripts`: [
`panflute = panflute:main',
],
},
(And you need to provide a __main__
.) If you want cli options, getopt would work.
One of the complexity involved and needed to balance is security. Let's say panflute choose the safest approach that only copy it to $DATA-DIR/filters
and support this feature (of auto-download filters) for pandoc >= 1.18 only. Even in this case, there might be security implication since a user might have formerly added $DATA-DIR/filters
to their PATH (when they were working with an earlier version of pandoc). So anything copied to that folder would be in the PATH and executable (probably, depends on how the user setup) without sudo. So then the panflute will open a point of attack to install arbitrary code.
And even if $DATA-DIR/filters
is not in the PATH, panflute running the filters automatically still means it's an opening for attack.
[sidenote: I'm considering writing a filter that can execute code in the markdown source, say, through exec
or !
in iPython. This also have security implication. And hypothetically, say, if such filter make a pull request to the said centralized-repository, I'm not sure if it should be accepted for the sanitization for security reason.]
That's the reason behind having the filters hosted in the same centralized repository. This way, the core-developers can verify the code is not malicious, and any change to the code requires a separate pull request for sanitization.
[another sidenote: I think the closest thing to our idea is \usepackage
in LaTeX. Arbitrary \usepackage
can be specified in the document, so the packages are centralized in CTAN for sanitization and distribution.]
If we really do not want centralized hosting, then we might need to learn from the example of how, say, brew handles it. For each additional unknown repository to add, you need to brew tap
into that (manually). And then brew will also calculate the SHA-256 sum to check the source hasn't been modified (meaning if the source is modified, a separate pull request is required to update the SHA-256 sum, hence in principle it is sanitized.) This approach however, will take away the "seamless" part of our (at least my) dream.
But I understand the concern about complexity. For example, we can defines rules of submitting the filter (including a clear standard on specifying the author). Every issues submitted has to call the name of the author, and let the author deals with the bug (this is how Travis CI provides 3rd party/community-based languages). In addition, tests, docs, might also be required.
There's potentially a problem of resistance to adoption, and might consume too much time (who knows how much more busy we will becomes). But I think the security issue is more important. panflute will be given too much power (by downloading arbitrary executable codes, either directly or indirectly), and hence we should guard the filters it can download more carefully.
On the other hand, the added barrier might means a high quality of filters submitted, and lesser pull request to deal with. Given the pandoc community is relatively small and (probably) not much people are writing pandoc filters (although I'm sure one of the goal of panflute is to change this!), it seems probably we won't be too busy. (a data to backup this argument is, after a decade, the list in Pandoc Filters · jgm/pandoc Wiki is not very long. I'm sure only the people are motivated enough to put a link to their filter in pandoc wiki will be motivated to try a centralized filter repo.
By the way, I don't think having their filters submitted to a centralized repo means they can't have their own repo. Just like CTAN, some of the sources are elsewhere (say, in GitHub). They can even write a script to prepare their codes to be summited in our centralized repo.
Closing this as all the ideas are now either in separate issues or have been implemented.
Also see: https://github.com/sergiocorreia/panflute/projects/1
.toJSONFilter()
and.toJSONFilters()
method names are not Pythonic at all (and hard to understand unless you were a previous user of pandocfilters or know the internals of Pandoc). Maybe change it to something like.run_filter()
and.run_filters()
(but keep the old names as wrappers for compat!)panflute-filters: onefilter, another
).python somedoc.md -F panflute
, and panflute itself can be used as a filter that calls the filters listed in the metadata. This fixes problem 2.TLDR:
.toJSONFilter()
while keeping the old name as wrapper for compatibility