pandoc / lua-filters

A collection of lua filters for pandoc
MIT License
602 stars 165 forks source link

Change this repository into a collection of links? #207

Open jgm opened 2 years ago

jgm commented 2 years ago

One drawback of the current structure is that people submit code here but then don't monitor the repository, and issues are neglected. Perhaps it would be better to make this simply a collection of links to lua filters that are maintained in independent repositories?

cagix commented 2 years ago

I kind of like this idea. Maybe this repo could serve as a kind of collection of "official" scripts from the Pandoc creators and all other filters could be linked in the README (sorted by topic)? That would reduce the maintenance to checking the links every year. In addition, the "official" code provided could serve as a live demo / live documentation of the Lua API.

tarleb commented 2 years ago

I'm very much in favor of that; it would save me a lot of time. It takes a significant amount of effort, on almost each new pandoc release, to adjust the tests and filters to the changes. It's tiresome, and pinging all authors and waiting for them to change the code would take just as long. I'd be glad to get out of that obligation.

We could still do occasional automatic "releases", which pack the filters into a single archive. This shouldn't be too hard if the individual repos use a common structure.

cagix commented 2 years ago

We could still do occasional automatic "releases", which pack the filters into a single archive.

That sounds interesting, but might not be quite easy with regard to the then presumably different licences in different repos?

This shouldn't be too hard if the individual repos use a common structure.

Would maintaining a template repo help with this?

tarleb commented 2 years ago

Pinging everyone who contributed a filter so far: what do you think of this idea? What would you need to make this as painless as possible for you?

@jdutant @tolot27 @blake-riley @not-my-profile @svenevs @b3 @jkr @cole-miller @sokotim @korintje @gtuckerkellogg @stroobandt @frederik-elwert @odkr

cagix commented 2 years ago

Since each filter belongs to a subfolder, it should be easy to split your repository into several individual repositories using git filter-branch and retain the individual history :)

not-my-profile commented 2 years ago

I am not sure about this. There are advantages to having a common repository. Having them all here guarantees that other pepole can contribute improvements even when the original author has ceased maintenance.

It takes a significant amount of effort, on almost each new pandoc release, to adjust the tests and filters to the changes. It's tiresome, and pinging all authors and waiting for them to change the code would take just as long. I'd be glad to get out of that obligation.

I think an easy fix for that would be to have a latest_pandoc_supported variable for each filter. When a new pandoc version is released that variable could be automatically bumped for each filter which tests pass with the new version. And the script could automatically update a table in the README of this repository that lists all scripts known to work with the latest version. Even the pinging of authors when their filter is no longer compatible with the latest version of pandoc could be easily automated.

Especially if you still occasionally want to release filter bundles having all filters in a single repository should make that easier. Otherwise you just have new potential problems to deal with (e.g. some repository went offline, some repository suddenly has an unexpected file structure, etc.).

bpj commented 2 years ago

I am in favor, but what about having this repository contain submodules/subtrees/subrepos linking to contributors' repositories so that people can still pull this repository and get all filters? I suppose a cronjob or action could be set up to update the links every day/week/month.

Den fre 31 dec. 2021 09:56Martin Fischer @.***> skrev:

I am not sure about this. There are advantages to having a common repository. Having them all here guarantees that other pepole can contribute improvements even when the original author has ceased maintenance.

It takes a significant amount of effort, on almost each new pandoc release, to adjust the tests and filters to the changes. It's tiresome, and pinging all authors and waiting for them to change the code would take just as long. I'd be glad to get out of that obligation.

I think an easy fix for that would be to have a latest_pandoc_supported variable for each filter. When a new pandoc version is released that variable could be automatically bumped for each filter which tests pass with the new version. And the script could automatically generate a table for the README that lists all scripts known to work with the latest version. Even the pinging of authors when their filter is no longer compatible with the latest version of pandoc could be easily automated.

— Reply to this email directly, view it on GitHub https://github.com/pandoc/lua-filters/issues/207#issuecomment-1003313160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI3OU5UGRQWJSEWCOAMAKDUTVV2ZANCNFSM5LAX3HCQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

korintje commented 2 years ago

I agree with this idea. It is because the roles of the filters are highly independent, so we cannot expect large synergy effects by collecting them at one repository. Instead, I think it is better to focus on keeping accessibility, readability, and consistency of each filter documents. Listing the filters at Pandoc official web page or GitHub Pages would be nice. As far as I know, one ideal example is crates.io, which is a kind of library repository for Rust language, is known to have well-formatted and easily-accessible documents.

jgm commented 2 years ago

There are advantages to having a common repository. Having them all here guarantees that other pepole can contribute improvements even when the original author has ceased maintenance.

In that case you could always fork the original repository, make your changes, and submit a PR here for an updated link to the fork.

I am in favor, but what about having this repository contain submodules/subtrees/subrepos linking to contributors' repositories so that people can still pull this repository and get all filters?

I'm not sure it's all that valuable to be able to get all the filters in one repository. Generally you only need some of them; why not just clone those separately?

tolot27 commented 2 years ago

I like the idea of submodules and switching to them should be easy because we have subdirectories, already. submodules can be checked out from their origin and individually. Having this repository as the main repository has the advantage that checks (i. e. in case of a new pandoc version) can be maintained at a central place.

svenevs commented 2 years ago

I don't have any preference either way, if it makes things easier for maintainers then I'm all for it :heart: I'm pretty sure my filter is feature complete, but I'm sorry if I've missed any issues related to it.

alerque commented 2 years ago

I'm really not convinced this would be an adventitious move. Having some complex filters that see a lot of development in their own repos is a good thing perhaps (and we have a history of suggesting that) but for small one-off ones that tend to be submitted, used by them a few times, and then the submitter moves on I think a large chunk of them would fall below some minimum threshold that would make them viable FOSS projects on their own. Having a team of maintainers at least reviewing submissions here adds some amount of normalizing and consistency that makes filters in this repo much more attractive than random ones out of people's Gists, and for maintenance not having the bottle neck of one maintainer that got it working form themselves and then is never motivated to tweak it to be more generally useful seems seems like a benefit to most simpler filters.

jgm commented 2 years ago

@alerque I think the filters could still be reviewed -- at any rate, we wouldn't need to include links to filters that didn't look good. The aim would be to change where bug reports and enhancement or support requests go. They should go to the author of the filter, not to the pandoc maintainers.

stroobandt commented 2 years ago

As a first-time, one-off contributor, I have to admit that the current process together with the suggestions and help provided by @tarleb rendered my contribution more worthwhile and universally applicable. That would not have happened without the "editorial work" of @tarleb. The current process could be considered as a very valuable peer-review, where the value eventually goes to the end user.

Another admission of mine is that my extended family and I usually employ my filter only with the version of pandoc that comes packed with the latest Ubuntu LTS release and upgrades. The reason for this being the fact that too many of my users on too many machines require a stable system environment for work/study.

This certainly does not mean that I would not maintain my filter. However, if the user community at large fails to prod me, I would typically notice a version compatibility problem with my filter only when a new version of pandoc eventually lands in the Ubuntu LTS repositories.

I hope this straightforwardness helps with reaching a consensus about how to proceed with this great, curated collection of filters.

b3 commented 2 years ago

I do not have any smart definitive answer to this good question. I added thumb up to comments given ideas that I like.

It is a fact that I didn't follow issues here for my small filters (thanks to @jgm I will now try to check them).

It is also a fact that, as @stroobandt states, @tarleb work rendered my small contributions more worthwhile and universally applicable.

IMHO I think that keeping a common framework (at least for tests and description for instance) need however to be kept.

Being able to fetch all code at once is also a nice facility (which helps me being inspired) but still can be offered if this repo is changed to a simple list of links.

Sorry not being able to help more concretely.

alerque commented 2 years ago

The aim would be to change where bug reports and enhancement or support requests go. They should go to the author of the filter, not to the pandoc maintainers.

To some extent, we can get the best of both worlds. If the code stays here and we add contributors to a GitHub team with limited access to this repo, we can use .gitattributes to specify GitHub users as code owners for the filters they contribute. That way they would not only get asked to be involved in code review if somebody touched their code, but they could be assigned to related issues and such.

My experience is that people are even more likely to stay involved and take some ownership over their code if it has the publicity of being in an official repository rather than being in their own ad-hoc repos. Anybody that is going to keep on top of issue reports on their own repo is also likely to stay involved with it if they have some ownership in a bigger project.

benabel commented 2 years ago

I really like this repository and it is a great source of inspiration when writing filters. It is very useful to have all these filters in one place. Of course, I agree the plugins creators should maintain their plugins( if they have the time to). Some plugins could be placed in a unmaintained folder or repo if they can't. Also a table in the README would be useful indicating filters name, description and formats processed.

jgm commented 2 years ago

we can use .gitattributes to specify GitHub users as code owners for the filters they contribute.

Can you elaborate? What would the syntax be? That would certainly be an improvement, as now there's no way to figure out who contributed which filter other than looking at git history.

alerque commented 2 years ago

Here is an example CODEOWNERS file that uses .gitattributes syntax to assign code is a repository to different people. The @... names can be individual accounts or teams (or mix and match) that can have multiple members. This will automatically request they review any PRs touching those code paths as well as open the door to other GitHub tooling like allowing them to approve PRs if code-owners approve them.

What it doesn't do is triage bug reports and assign them to those owners. That would still need to be done manually, only PRs are automatically assigned.

jgm commented 2 years ago

I've added .github/CODEOWNERS, but I don't know the github handles of the contributors. Maybe people can update this themselves with PRs?

ickc commented 2 years ago

As a side note, I think this is really about having a packing index and a package manager.

people like this and pandocfilters (the Python one) because they act like both. It is a centralized location that once cloned, someone else is maintaining that for you which should guarantee it is working with the latest-ish pandoc.

The problem of this repo is that it isn't going to scale well (into many filters) and the work of maintenance is transferred to the maintainer.

Years ago some of us proposed to have a package manager, and there was a prototype. But there was a few problems. First we mixed the 2 related concept in one solution, and second it isn't official.

In short I think the right direction would be to have an official package index (like CTAN). This is similar to the "link" concept above, but more formal. May be a YAML file with a certain spec. The official pandoc community advertise this as the pandoc packaging index that people should submit to as authors and discover as users.

Then we can let 3rd parties to build a ecosystem around it. Eg a package manager (similar to 3rd party filter framework), or a website (like the 3rd party Mac AppStore-like website for homebrew).

tarleb commented 2 years ago

Allow me to think out loud for a moment; this gets a bit fundamental and includes some of the good points others already made above.

What I like about this repo:

What I dislike:

In conclusion, I'd still rather turn this repository into a collection of links. My proposal:

  1. Create a template repository for Lua filters. This way we can still encourage a certain standard layout, but filter authors have the freedom to do whatever they feel is right.
  2. Add an issue template for adding new links: this should include a checkbox to select if the author wishes for a detailed code review of their filter. We could go as far as to encourage community review by sending an automated mail to pandoc-discuss whenever such an issue is opened.
  3. Slowly move filters to separate repos, but explore ways to create collections of all filters listed here and adhering to certain conventions.

Edit: Forgot to make my main point: it seems unreasonable to expect people to maintain code that they no longer control; the sense of ownership is much stronger if authors can retain full control over their code.

jgm commented 2 years ago

I think this is a good plan!

tarleb commented 2 years ago

I started work on a template repository. It's not quite done yet, but feedback is welcome, especially if it takes the form of a PR ;)

The template contains code to create a documentation page, but I'm not happy with having the HTML and CSS in the main branch. If anyone has some ideas on how this could be avoided, then please let me know.

bpj commented 2 years ago

BTW I wonder whether my filters written in MoonScript which is then compiled into pure Lua code would be acceptable if/when this repository turns into a catalog?

(FWIW I also have a helper library for Lua/MoonScript filters written in MoonScript, which perhaps could be linked to. I already have begun writing a README including instructions on how how to setup the LUA_PATH environment variable, based on an excellent [blog post][] which now sadly can only be found on the Wayback Machine.)

[blog post][]: https://web.archive.org/web/20210127030419/http://www.thijsschreijer.nl/blog/?p=1025

Den fre 21 jan. 2022 09:37Albert Krewinkel @.***> skrev:

I started work on a template repository https://github.com/tarleb/lua-filter-template. It's not quite done yet, but feedback is welcome, especially if it takes the form of a PR ;)

The template contains code to create a documentation page https://ztkrt.de, but I'm not happy with having the HTML and CSS in the main branch. If anyone has some ideas on how this could be avoided, then please let me know.

— Reply to this email directly, view it on GitHub https://github.com/pandoc/lua-filters/issues/207#issuecomment-1018292762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI3OU2DUQAUGU47EWSJ76TUXELIFANCNFSM5LAX3HCQ . You are receiving this because you commented.Message ID: @.***>

tarleb commented 2 years ago

I'd think that'd be perfectly fine, esp. if the transpiled Lua script can be downloaded somewhere.

Off topic: my long-term goal is to write filters with teal.

cagix commented 2 years ago

The template contains code to create a documentation page, but I'm not happy with having the HTML and CSS in the main branch. If anyone has some ideas on how this could be avoided, then please let me know.

Hmmm, if the goal is to provide nicely rendered documentation, you could write the documentation in Markdown and use a simple workflow that uses Jekyll or Hugo, deploying the result as Github pages.

Alternatively, you could use a workflow where Jupyter notebooks are generated from the Markdown with Pandoc and made available as Github pages (.ipynb files will be rendered as preview directly by Github).

bpj commented 2 years ago

Den fre 21 jan. 2022 10:29Albert Krewinkel @.***> skrev:

I'd think that'd be perfectly fine, esp. if the transpiled Lua script can be downloaded somewhere.

Of course they are, in the very same directory, where the moonc compiler puts them by default.

Off topic: my long-term goal is to write filters with teal

https://github.com/teal-language/tl.

— Reply to this email directly, view it on GitHub https://github.com/pandoc/lua-filters/issues/207#issuecomment-1018332250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI3OU2NYW67HQGXW7YKN7LUXERLVANCNFSM5LAX3HCQ . You are receiving this because you commented.Message ID: @.***>

alerque commented 2 years ago

Sorry I haven't had time to jump in and help with this yet. A template repo is a great idea. Including some CI to test pandoc interactions would be good to include there too.

Before this gets too far though I just wanted to throw in the idea that if filters are going to be independent, it might actually be useful to package them as Lua Rocks. The luarocks infrastructure can actually be used for this (plugins for some Lua app as opposed to stand alone packages) and has a concept of manifests to organize them. This would bring in free tooling for versioning, distribution/packaging, dependency management (including both on other Pandoc filters or other LuaRocks), etc.

odkr commented 2 years ago

Off topic: my long-term goal is to write filters with teal.

Okay, this is off-topic; but Teal seems intriguing. Why a "long-term" goal? Is there a reason not to use it just yet?

tarleb commented 2 years ago

I've opened an issue for teal support on the hslua repo, let's move our OT discussion there. ;)

Leveraging luarocks has crossed my mind too; it seems orthogonal to using a template repo. In fact, if you want to add a sample rock definition there, the PR would be welcome.

The issue tracker of the new repo is probably a good place for additional suggestions.

alerque commented 2 years ago

Leveraging luarocks has crossed my mind too; it seems orthogonal to using a template repo.

Yes it is.

By the way I've talked about this with LuaRocks folks in the past (in various times in reference to SILE packages and vim plugins and) and they are universally supportive of the idea and willing to make any upstream adaptations that are necessary—but to date it doesn't seem that any really are, the use pattern is already supported.

chrisaga commented 2 years ago

Thanks @tarleb for pointing me to this conversation on my second contribution to this repository. As a new contributor, I can say that :

  1. I got inspiration from the collection of filters I found here. Would have been less easy with separate repositories.
  2. The review on the first filter I submitted (column-div which is currently in draft mode due to a little bug I still have to chase) helped me to get it to a code quality and a functional level I wasn't aiming for (even if I wasn't happy with the review at first ;-) )
  3. I am not very comfortable with the CI test I tried to reproduce from what I found in this repository. I missed an official template. I see this template exists now. Good thing even if you finally decide to keep everything in one common repository.
  4. Regarding this common repository thing. I am a contributor to another similar project. It's lua-scripts, a repository of Lua plugins for Dark Table the photo processing tool. I contributed on localization (French) of multiples scripts. I event worked on some scripts I don't really use myself only because they belong to the same repository.
  5. In this repository, we have two main categories : official (scripts which are officially supported and reviewed by the team) and contrib (scripts which are supported by external contributors)

That said, I am totally OK to take my filters back to my own repository if you decide so. I would add that I am totally with @alerque on the need to package the filters so they would not be bunches of code floating around in github that users must catch one by one.

nandac commented 2 years ago

@tarleb I would like to contribute a plugin. After reading this discussion I am confused about the way forward what should I do?

tarleb commented 2 years ago

That's great to hear @nandac. The best way is probably to create a new personal repo based on the template. See there for more instructions. Please let us know in case you run into issues with the template -- it is still experimental.

Once you have set up the filter repo, please open an issue and ask for it to be included. We still need a place to add links, so it may take us a little longer.

nandac commented 2 years ago

Thanks, @tarleb I have set up a repo in my personal space using the template.

bpj commented 2 years ago

@tarleb I wonder how I might convert my filter repos, both published and unpublished ones to use the template. Perhaps create a repository with the unmodified template — just renaming README.md to README-template.md so as to not clobber any existing README.md —, add that repo as a remote and merge it in?

Den ons 26 jan. 2022 12:54Albert Krewinkel @.***> skrev:

That's great to hear @nandac https://github.com/nandac. The best way is probably to create a new personal repo based on the template https://github.com/tarleb/lua-filter-template. See there for more instructions. Please let us know in case you run into issues with the template -- it is still experimental.

Once you have set up the filter repo, please open an issue and ask for it to be included. We still need a place to add links, so it may take us a little longer.

— Reply to this email directly, view it on GitHub https://github.com/pandoc/lua-filters/issues/207#issuecomment-1022128531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI3OU6DLGL7WJIX5I7RJE3UX7OFBANCNFSM5LAX3HCQ . You are receiving this because you commented.Message ID: @.***>

alerque commented 2 years ago

@bpj Template repositories are meant to be used as a base to clone from (and GH and a function for doing this), and you want things named what the final name is going to be, not something that will need to be shuffled around.

Converting existing repos is a bit trickier. Merging as you describe is technically possible with some next level Git ninja commands to join histories with no common root, but it also brings with it a pile of issues that most people would struggle to deal with later (e.g. git blame needing special handling).

I suggest just using a tree diff on existing repositories and manually massage them to be as alike or different as you feel like without doing any merge foo. A how-to on this could be useful to add to the template, but I would focus usage on getting new projects going.


On a different topic, I'll be looking into some subtree splits to help people with filters here already get them split out with full history for use in stand alone repositories. Once the dust settles a little bit on what we are recommending for stand alone repos we can look at migrating current ones to that model.

mfhepp commented 2 years ago

Joining this discussion quite late: I think there is one huge argument in favor of a central repository for the most common LUA filter for Pandoc, and that is security: Since Pandoc is typically running with full user privileges, a LUA script can do really nasty stuff (steal information, load malicious content, ...). While this central repository does not make it impossible to inject malicious code into the most popular filters, it at least provides

A mere collection of links will cause more fragmentation and hence make it less likely and slower to spot and mitigate security risks.

Also, there are lots of commonalities among filters; with a central repository, it will be easier to modularize and reuse code.

I have a really bad feeling watching a growing community of non-developers running arbitrary code from some private Github repositories found by googeling for some Pandoc/LaTeX problem.

Search for "supply chain attacks" to get a glimpse. This is even more of an issue given that LUA is a bit of a niche language, further complicating it for many to understand what a piece of LUA code is doing on their machine.

jgm commented 2 years ago

From the pandoc manual:

.

A note on security

If you use pandoc to convert user-contributed content in a web application, here are some things to keep in mind:

  1. Although pandoc itself will not create or modify any files other than those you explicitly ask it create (with the exception of temporary files used in producing PDFs), a filter or custom writer could in principle do anything on your file system. Please audit filters and custom writers very carefully before using them.

You are right, of course, that people can get into big trouble by running filters they download. And having a central repository would help with that. The problem is that it takes a lot of human-power to review the code, integrate pull requests, etc. We just don't have enough of that.

ickc commented 2 years ago

What is described is related to web of trust and basically what you said is as you trust pandoc you also trust other stuffs maintained by the same or related developers.

another related concept here is package manager. It does not solve the trust issue by itself, but basically now you’re trusting the maintainer of a package index rather than the developer. (Of course trusting both, ie you select package from author you trust only, is better.) Also just to mention that typically package index can be dangerous because there’s no “maintainer” you need to be approved from, in the example of PyPI.

Put it this way then the problem above is saying that “monolithic package index” like this puts too much burden to the maintainers, which is doing both job. A “proper package index” splits the burden into individual maintainers managing their own package, and a package index maintainer(s) who maintains the quality.

ickc commented 2 years ago

Just to elaborate a bit more, there’s also more incentive for the developer to maintain their script as they typically are the biggest user of that script. The problem then is to have a package index that people will have incentive to use, including official blessing and simplicity (and adequate level of trust.)

But to name the cons of having a centralized package index like this, it makes releasing a breaking release AST slightly easier. (But then the blame should be put to end users that upgrade without considering “pinning” their version. Again, a problem arises when not thinking this in terms of packaging.)

by the way, I’m not complaining as there’s no perfect solution. look at LaTeX for example, while they have a package index, packaging is a mess as version cannot easily be controlled so package can breaks mysteriously and then authors are conditioned to release backward compatible changes only, which leads to worse experience (bad behavior should be discontinued.)

gtuckerkellogg commented 2 years ago

I'm largely agnostic; I contributed a filter which my PhD students have used, but I'll own up to neglecting it recently if issues have come up. I'll maintain my agnosticism, but I'll be happy to rectify my neglect of the issue no matter how it's decided.

chrisaga commented 2 years ago

@alerque

On a different topic, I'll be looking into some subtree splits to help people with filters here already get them split out with full history for use in stand alone repositories. Once the dust settles a little bit on what we are recommending for stand alone repos we can look at migrating current ones to that model.

I am not a git expert but I can share what I figured out to move to their own repository two Lua filter I had proposed as my contribution to lua-filters .

1) Create the repository on Github from @tarleb's lua-filter-template. NB. The master branch name is main (came from the template). 2) Make sure everything is clean in the lua-filters forked repository I am working on and make a fresh clone (named after my new Github repo):

git clone lua-filters hk-pandoc-filters

3) Install git-filter-repo since git-filter-branch man page advise to switch to the former. 4) Remove the remote link, remove everything but my two filters, add the just created Github repository as a remote.

cd hk-pandoc-filters
git remote remove origin
git filter-repo --path column-div/ --path tables-vrules/
git remote add -f origin git@github.com:chrisaga/hk-pandoc-filters.git

5) At this point I have my two Lua filters in the master branch and the new files from @tarleb's template in the main branch. The two branches are not related (no common ancestor) but I still can merge them with the appropriate option.

git checkout main
git merge --allow-unrelated-histories master

6) Do some cleaning with git mv and git rm and push everything to Github

git commit -a
git push

7) Check everything is OK on Github and remove the now useless branch

git branch --delete master
tarleb commented 2 years ago

I have created a new organization pandoc-ext and have started to migrate filters there. Each filter will be placed in a separate repository, as this makes it easier to use the filters with RStudio's quarto. I will only transfer those filters that I intend to maintain.

The main impediment right now is my template repository, which still needs more work.

cagix commented 2 years ago

Step 4 could also be done more easily through git subtree: In hk-pandoc-filters you can perform a git subtree push --prefix=<yourfolder> <cloneofluafiltertemplate> <branch>, where <branch> should be different from main or master.

ickc commented 1 year ago

@tarleb, if you want to, I can invite you as maintainer of https://github.com/pandoc-extras which is intended for any "pandoc extras" kind of stuffs.

tarleb commented 1 year ago

Thank you @ickc, and sorry for the late reply, I had forgotten about this. I went with the pandoc-ext name to mirror the quarto-ext org. Now we have two such orgs, but I think that's ok.

I've sent you an invite to become a maintainer at the new org.

cagix commented 1 year ago

so, now we have both, https://github.com/pandoc-extras and https://github.com/pandoc-ext? does that mean some kind of split in the "pandoc extras"? and also there is https://github.com/pandoc/lua-filters ...

tarleb commented 1 year ago

I don't see it as a split, it's two separate orgs with slightly different goals.

As for this repo, it should probably be archived at some point.