Using github actions from a github app

njsmith commented 4 years ago

Eventually, we want snekomatic to be able to do things like:

check out a source repository and run pip-compile to update deps
check out a branch and run auto-formatters
run setup.py and upload the resulting binaries

There are two difficult things about running these kinds of code from inside a github app:

You really want to be able to see the logs, ideally through some nice web interface, so you can debug when stuff goes wrong.
Our github app basically has admin privileges on the whole organization, so we don't want to be running random code like setup.py in there; it's a potential security problem. (Generally our threat model is: we are quick to trust people to commit stuff to our repos, because it's easy to revert if they do something nasty; but we don't want to let people escalate that to get longer-term privileges or avoid logging/auditing.) So setup.py at least needs some kind of sandboxing.

We could try to implement this stuff ourselves, but I'm thinking: what if we let Github Actions do the work for us? They've already solved both of these problems.

I considered a bunch of different strategies, and I think the one that makes the most sense is:

We write a script in this repository that knows how to do the various actions we need ("clone repo and run pip-compile", "run setup.py", etc.)

We add an action.yml and Dockerfile to this repository, that will let our script run as a github action

We add a .github/workflows/blah.yml file to configure that script to run in response to a repository_dispatch event. In particular, something like:

on: [repository_dispatch]

jobs:
  unprivileged:
      runs-on: ubuntu-latest
      name: "unprivileged-${{ github.event.client_payload.id }}"

       steps:
       - uses: python-trio/snekomatic@master
       - uses: action/upload-artifact@v1
         with:
           name: artifacts
           path: artifacts/

  privileged:
      runs-on: ubuntu-latest
      name: "privileged-${{ github.event.client_payload.id }}"
      needs: unprivileged

       steps:
       - uses: action/download-artifact@v1
         with:
           name: artifacts
       - uses: python-trio/snekomatic@master
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
           pypi_token: ${{ secrets.PYPI_TOKEN }}

There's a lot going on here. Here's how it would work:

When the snekomatic heroku app wants to run one of these operations, it uses the github api to send a repository_dispatch event to the python-trio/snekomatic repo. This event can contain an arbitrary JSON payload, which we use to encode info about what operation we want to run. E.g.:

{
  "operation": "pre-merge-formatting",
  "id": "FEMHkBB9cXrw1g",
  "repo": "python-trio/trio",
  "pr": 1234
}

The Github actions machinery sees this event, and spawns a copy of our "workflow". This workflow has two "jobs", which are run sequentially (because of the needs: key on the second job).

Each job gets its own virtual machine. The first job doesn't get passed any secrets, so it can run arbitrary untrusted code, and github will take care of sandboxing it. The only way it can affect the outside world is by dropping files into the artifacts/ directory, which will be zipped up and passed to the second job. The second job then can take those files, and do whatever it wants with them.

For example, the first job might clone the given repo, check out the given revision, do python setup.py sdist bdist_wheel, then put the resulting sdist and wheel into the artifacts/ directory. The second job can then take the sdist and wheel files, and upload them to PyPI. If someone puts some nasty code in setup.py, it could create an arbitrarily broken sdist/wheel... but that's about all it could do.

The script that runs inside the jobs can see the information about the repository_dispatch event that triggered it. This includes the arbitrary json payload, which it can use to decide what to do. But before it does that, it should first check that the sender field in the event shows it coming from trio-bot. Any app with read-only access to the repo can create a repository_dispatch event, and who knows what kind of shenanigans that could enable, so better to make sure the message really comes from trio-bot up front.

When the jobs start, Github sends out a check_run event to any webhook listeners, and this includes both the "name" of the job, plus a bunch of metadata about the job. By using clever template-y magic, we interpolate the id field from the repository_dispatch payload into the job name. This way, our Github app can listen for the check_run event, and match it up with the repository_dispatch it sent. This way it can do stuff like post back to the original user who triggered the operation "OK, that's started, if you want to watch the progress then click here: [URL]", and if it's running multiple operations at the same time it can keep track of which one is which, and eventually get the exit status to figure out whether it succeeded or failed.

So... yeah. It's kind of elaborate, but I think it all does work, gives solid, easy-to-reason-about security guarantees, and is way easier than building our own sandboxing mechanism.

webknjaz commented 4 years ago

run setup.py and upload the resulting binaries

FYI I've created a curated Action for publishing dists a while back: https://github.com/pypa/gh-action-pypi-publish. Tutorial: https://packaging.python.org/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

njsmith commented 4 years ago

@webknjaz Yeah, that action is great for normal workflows. Unfortunately there are a bunch of weird limitations that the hack described above is working around – and one of them is that you can't select which repository_dispatch events trigger a workflow, either the workflow runs on every repository_dispatch or it runs on none of them. Which I think means we want a single workflow that runs our code, and then our code will look at the repository_dispatch event and decide what to do. But that means that for this hack, we can't take advantage of pre-made actions, because there's no way for one task step to add/remove other steps...

Fortunately it's pretty easy to run twine manually.

webknjaz commented 4 years ago

Can't it be controlled with if: clauses?

webknjaz commented 4 years ago

Also, I assume that the first (unprivileged) job that uses snekomatic action will do the validation and will fail if the sender isn't the bot. So the next job that explicitly depends on the first one will not even appear in the workflow. And if the check is ok, the second job doesn't need to validate the sender because it's still the same.

webknjaz commented 4 years ago

Oh, and have you considered relying on the deployment event instead? I think, this one requires an app to request access to this API from the user.

webknjaz commented 4 years ago

This suggests that repository_dispatch must supply type: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch

So it may work with the types: filtering: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#onevent_nametypes

njsmith commented 4 years ago

This suggests that repository_dispatch must supply type: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch So it may work with the types: filtering: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#onevent_nametypes

Yes, that would make sense, but the docs explicitly claim that repository_dispatch doesn't support the types: keyword: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch

It's possible the docs are wrong! Figuring out what does and doesn't work in these hosted CI systems always makes me feel like a fantastical scholar searching through forgotten archives and ancient inscriptions for hints to forbidden knowledge.

Can't it be controlled with if: clauses?

Maybe? I saw something that claimed that if: had awkward ergonomics so I didn't dig into it very far... if: might also be useful to validate the sender before doing anything else.

webknjaz commented 4 years ago

the docs explicitly claim that repository_dispatch doesn't support the types: keyword:

It may be because it doesn't have any strictly hardcoded types (because it's a custom thing) but after that table the same doc says:

To trigger the custom repository_dispatch webhook event, you must send a POST request to a GitHub API endpoint and provide an event_type name to describe the activity type.

This makes me think that the representation of the information may be misleading.

if: might also be useful to validate the sender before doing anything else.

It'd probably be something like if: github.actor == 'trio-bot[bot]' or if: github.event.sender.login == 'trio-bot' && github.event.sender.type == 'Bot'.

Ref: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/contexts-and-expression-syntax-for-github-actions#contexts

webknjaz commented 2 years ago

Yes, that would make sense, but the docs explicitly claim that repository_dispatch doesn't support the types: keyword

FTR the docs now showcase the use of the types filtering: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#repository_dispatch

python-trio / snekomatic

Using github actions from a github app #68