pandoc-extras / pandocpm

Manage the install/update/uninstall of packages
https://pandoc-extras.github.io/pandocpm/
Apache License 2.0
7 stars 0 forks source link

How this works #2

Closed sergiocorreia closed 7 years ago

sergiocorreia commented 7 years ago

Filters

First, each filter needs a specific structure, as seen here and here

The key thing is the main() hook:

Every panflute script needs to end up with this:

def main(doc=None):
     return pf.run_filter(action, doc=doc)

if __name__ == '__main__':
    main()

Or a variant of this, but always i) with a main() function, ii) that receives an optional argument doc, which is sent to pf.run_filter (or any of run_filters, toJSONFilter, toJSONFilters), and iii) return s the output of the call

YAML for filters

Optionally, filters have an accompanying YAML file, as here: https://github.com/sergiocorreia/panflute-filters/blob/master/filters/debug.yaml

The metadata shown in the example is currently overkill, but ideally it should be used to construct a gallery of filters, search for specific filters, update them when a new version appears, etc.

Index of filters

It's a simple YAML file that points to the yaml (or .py) files: https://github.com/pandoc-extras/packages/blob/master/filters.yaml

Everything is easy to extend to things besides filters (in this case, just have a separate yaml file)

panflute autofilters

You can have metadata in the form of panflute-filters: somefilter or panflute-filters: [filter1, filter2]. Additionally, you can have panflute-verbose: true and panflute-path: somepath entries.

Panflute will search in the current dir, or datapath, or the path indicated in the metadata, or $PATH, for the filter, and if found, run it.

Note: this is currently not integrated with pandocpm, so no auto-installs will occur

Downloading with pandocpm

After installation, type pandocpm --help

As an example, this are some common patterns:

pandocpm install filter debug
pandocpm install filter debug --verbose
pandocpm install filter debug --replace
pandocpm uninstall filter debug

You can also set specific folders to install, or alternative indexes

Pending work

Edit: the checklist bubble is removed and migrated to #3.

ickc commented 7 years ago

I finally have some time to test it. It works great. I probably will spend more time on this later this week.

A few random notes:

ickc commented 7 years ago

Another question is if it should point to a fixed version (or particular commit) or always the latest version. The former approach allows matching SHA256 sum, and point to a ''stable" version that guarantee to work. But then if there's any major change in pandoc requiring all individual filters to be changed, a lot of manual updating is needed (but this shouldn't happen very often if at all. e.g. last change only requires the pandocfilters to be updated, but not the filters written in pandocfilters to)

sergiocorreia commented 7 years ago

Great to hear that it works on your side. About your points:

ickc commented 7 years ago

I revisited homebrew and homebrew-cask. I think a solution to our problem is already there:

These are what I learnt from homebrew (which becomes the package manager for macOS). They have extensive manuals and contribution guidelines. I might read them more in details later to see what to learn and borrow. (They definitely need to worry & process a lot more than us do. And they strongly relies on git and GitHub throughout.)

By the way, they have something called "tap", essentially a git repository hosting formula. They have a mechanism to "tap" into a repository unknown to brew. To us, it means effectively it lets the package manager to trust these other formula that pandocpm originally don't trust. I don't know if it will ever be a problem to us though. Because probably the only reason someone need to create custom tap is that homebrew don't accept their formula from a pull request (not stable, deprecated, etc.).

ickc commented 7 years ago

I remembered you mentioned panzer before, it seems to have a machanism to specify filters used in yaml already, how deep do you think the integration between panzer, panflute (that also can specify filters) and pandocpm (which installs filters)can be?

sergiocorreia commented 7 years ago

I think there should be clearly delimited boundaries between the three. Integration has advantages but the huge disadvantage is complexity (we don't have the manpower required to deal with that).

Now, how can the three interact?

sergiocorreia commented 7 years ago

Having yamls instead of index files sounds interesting, I'll give that a shot.

We can also ask pandocpm to use other repos, which would work equivalently.

I'm not sure about the complexity of using brew as either back or front end. Would have to read more about it, but my guess is that it probably has a lot of Mac-specific stuff and might not even work on some Linux distros and of course windows

sergiocorreia commented 7 years ago

note: out of all the package managers (gems, pip, php, node-npm, bower, cpan, brew), the spec of the ruby gems seem most useful: http://guides.rubygems.org/specification-reference/

Thus, the packages repo would have the following structure

/packages/filters/myfilter.yaml
/packages/filters/anotherfilter.yaml
/packages/templates/sometemplate.yaml
/packages/csl/somecsl.yaml
/packages/style/somestyle.yaml

And each .yaml file could have these fields:

version: 1.0.0
license: MIT
summary: 'Some filter"
description: 'long description goes here'
author = xyz # or authors as a list
files: [xyz.py, abc.py] # This would just copy the files to the $datadir/filters folder
url: the url were the files are located
installer: pep # or cabal, etc. , this would run "pep install xyz" instead of copying files
homepage: 'https://github/someone/somerepo'

You could also use other yaml fields for whatever reason

ickc commented 7 years ago

I'm not sure about the complexity of using brew as either back or front end.

I'm not suggesting using brew. I merely was talking about studying what it does and borrow ideas. Definitely if I write formula in brew, it would work, for mac and Linux. But I never heard of people porting it to Windows, and since it relies a lot on Unix commands, I think it probably can't be done.

Another thing is, at least for Python filters, may be a script can be wrote to parse the setup.py and convert to yaml. (I think brew has something like that and there's regular commits on formula by "robots")

ickc commented 7 years ago

Do we agree to centralize the YAML formula?

i.e. "YAML for filters" will be in "pandoc-extras/packages", and there will be no "Index of filters".

sergiocorreia commented 7 years ago

Would that involve having to host the filters/templates/etc on the packages repo? (I think linking to packages is easier and more likely to work than hosting the packages or even using git submodules)

ickc commented 7 years ago

Earlier on, I suggested using SHA-256 checksum. But now I think there can be a much simpler approach for us:

  1. centralizing all formula (except when specified explicitly).

    • make sure the integrity of the formula

    • side-benefit: any filter author wanting to use pandocpm don't have to manage the formula in 2 places (1 in their own repo, another in the master index in our repo)

  2. The urls in the formula needed to fit a specific requirement: rather than pointing to a generic url that might have its content changing over time, one need to specifically point to a "static target".

Reference on SHA-1 Vulnerability

In fact, in 2012 noted security researcher Bruce Schneier reported the calculations of Intel researcher Jesse Walker, who found that the estimated cost of performing a SHA-1 collision attack will be within the range of organized crime by 2018 and for a university project by 2021. Walker’s estimate suggested then that a SHA-1 collision would cost $2 million in 2012, $700,000 in 2015, $173,000 in 2018 and $43,000 in 2021. From Understanding SHA-1 Vulnerabilities — Is SSL No Longer Secure? - Entrust, Inc..

I guess it wouldn't be too worrying for our applications.

ickc commented 7 years ago

Would that involve having to host the filters/templates/etc on the packages repo? (I think linking to packages is easier and more likely to work than hosting the packages or even using git submodules)

No. Only the YAML formula will be centrally-hosted. It is very similar to how homebrew-cask host formula in this aspect.

Sidenote: centralized repo for simple packages

However, I'm considering providing repositories for centralized packages (totally optional). Because I see from pandocfilters/pandoc-templates pull request, there seems a need in this area. Sometimes for very simple filters/templates, a dedicated repository seems over kill, while a single file "repository" like gist might not be up to the job. e.g. One want to have at least 3 files: the package, the markdown source, and the native from them (for tests).

ickc commented 7 years ago

Auto-filter

By the way, because of the proposed security features, I think once they are implemented, panflute can be allowed to run pandocpm automatically, to make it just works.

Another related question is, do you think it is possible to implement autofilters for pandocfilters? Except the need to rewrite filters to have a main function, are there other problem?

An alternative approach of auto-filter would be, rather than having an auto-filter in panflute and panflute calling pandocpm, may be the auto-filter can be in pandocpm instead, where pandocpm lists panflute (and possibly pandocfilters) as dependency. Then the pandocpm as a filter can do everything under the hood:

  1. pip install pandocpm
  2. add the filter names in the YAML of the markdown
  3. pandoc -F pandocpm ...

Edit: a way to circumvent the main function problem is to embed the name of the main/action function in the yaml formula.

sergiocorreia commented 7 years ago

(Sorry for the late reply, it's been quite busy at the office lately)

Hosting the recipes seems interesting, as well as the test thing. One concern I have with the tests is that it's a lot of work, and might require extra dependencies (a lot of filters rely on external sources). This means authors might not want to write them.

Another related question is, do you think it is possible to implement autofilters for pandocfilters? Except the need to rewrite filters to have a main function, are there other problem?

autofilters just calls a python function, so it does not even know if it's calling a panflute or pandocfilter-based filter. So AFAIK just adding the main() function (with the correct arguments and return) should work. A problem though is that pandocfilters don't return anything (they just rely on toJSONFilter that writes to stdout).

ickc commented 7 years ago

Ah, I might not have been very clear. The 2 different kinds of centralizing is completely unrelated. Let's forget the centralized simple filters and tests for a moment (which is a separate project and still relies on the architecture below):

As far as the ability of install packages through pandocpm, the only centralization I'm proposing is the formula only. i.e. the formula is the only yaml one needs to write, and will not be besides their package, but in our centralized formula repository. (i.e. either no index, or index generated by the individual formula. Either way the package author need not to touch the index.)

sergiocorreia commented 7 years ago

What would be the smallest/simplest formula that we could require for now? (to get started)

ickc commented 7 years ago

What would be the smallest/simplest formula that we could require for now? (to get started)

Didn't think it through yet. I didn't read through all your code yet, from what I've read so far,

concerning the index:

After all these, it means the index is just a list of the names of packages. i.e. we might as well get rid of the index, or the index can be auto-generated, and then the index can simply be a plain text list of names, rather than in yaml.

concerning the individual formula, in addition to the existing

ickc commented 7 years ago

Edit: forget what I said. I see that your earlier proposal on the formula spec already included the license key.

ickc commented 7 years ago

This issue is split into #3, #5, #6, #7, #8, #9, since this is over-long and touched on very different issues that is hard to follow.