Open njsmith opened 4 years ago
run setup.py and upload the resulting binaries
FYI I've created a curated Action for publishing dists a while back: https://github.com/pypa/gh-action-pypi-publish. Tutorial: https://packaging.python.org/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/
@webknjaz Yeah, that action is great for normal workflows. Unfortunately there are a bunch of weird limitations that the hack described above is working around – and one of them is that you can't select which repository_dispatch
events trigger a workflow, either the workflow runs on every repository_dispatch
or it runs on none of them. Which I think means we want a single workflow that runs our code, and then our code will look at the repository_dispatch
event and decide what to do. But that means that for this hack, we can't take advantage of pre-made actions, because there's no way for one task step to add/remove other steps...
Fortunately it's pretty easy to run twine manually.
Can't it be controlled with if:
clauses?
Also, I assume that the first (unprivileged) job that uses snekomatic action will do the validation and will fail if the sender isn't the bot. So the next job that explicitly depends on the first one will not even appear in the workflow. And if the check is ok, the second job doesn't need to validate the sender because it's still the same.
Oh, and have you considered relying on the deployment
event instead? I think, this one requires an app to request access to this API from the user.
This suggests that repository_dispatch
must supply type
: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch
So it may work with the types:
filtering: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#onevent_nametypes
This suggests that repository_dispatch must supply type: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch So it may work with the types: filtering: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#onevent_nametypes
Yes, that would make sense, but the docs explicitly claim that repository_dispatch
doesn't support the types:
keyword: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch
It's possible the docs are wrong! Figuring out what does and doesn't work in these hosted CI systems always makes me feel like a fantastical scholar searching through forgotten archives and ancient inscriptions for hints to forbidden knowledge.
Can't it be controlled with if: clauses?
Maybe? I saw something that claimed that if:
had awkward ergonomics so I didn't dig into it very far... if:
might also be useful to validate the sender before doing anything else.
the docs explicitly claim that
repository_dispatch
doesn't support thetypes:
keyword:
It may be because it doesn't have any strictly hardcoded types (because it's a custom thing) but after that table the same doc says:
To trigger the custom
repository_dispatch
webhook event, you must send aPOST
request to a GitHub API endpoint and provide anevent_type
name to describe the activity type.
This makes me think that the representation of the information may be misleading.
if:
might also be useful to validate the sender before doing anything else.
It'd probably be something like if: github.actor == 'trio-bot[bot]'
or if: github.event.sender.login == 'trio-bot' && github.event.sender.type == 'Bot'
.
Yes, that would make sense, but the docs explicitly claim that
repository_dispatch
doesn't support thetypes:
keyword
FTR the docs now showcase the use of the types filtering: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#repository_dispatch
Eventually, we want snekomatic to be able to do things like:
pip-compile
to update depssetup.py
and upload the resulting binariesThere are two difficult things about running these kinds of code from inside a github app:
You really want to be able to see the logs, ideally through some nice web interface, so you can debug when stuff goes wrong.
Our github app basically has admin privileges on the whole organization, so we don't want to be running random code like
setup.py
in there; it's a potential security problem. (Generally our threat model is: we are quick to trust people to commit stuff to our repos, because it's easy to revert if they do something nasty; but we don't want to let people escalate that to get longer-term privileges or avoid logging/auditing.) Sosetup.py
at least needs some kind of sandboxing.We could try to implement this stuff ourselves, but I'm thinking: what if we let Github Actions do the work for us? They've already solved both of these problems.
I considered a bunch of different strategies, and I think the one that makes the most sense is:
We write a script in this repository that knows how to do the various actions we need ("clone repo and run
pip-compile
", "run setup.py", etc.)We add an
action.yml
andDockerfile
to this repository, that will let our script run as a github actionWe add a
.github/workflows/blah.yml
file to configure that script to run in response to arepository_dispatch
event. In particular, something like:There's a lot going on here. Here's how it would work:
When the snekomatic heroku app wants to run one of these operations, it uses the github api to send a
repository_dispatch
event to thepython-trio/snekomatic
repo. This event can contain an arbitrary JSON payload, which we use to encode info about what operation we want to run. E.g.:The Github actions machinery sees this event, and spawns a copy of our "workflow". This workflow has two "jobs", which are run sequentially (because of the
needs:
key on the second job).Each job gets its own virtual machine. The first job doesn't get passed any secrets, so it can run arbitrary untrusted code, and github will take care of sandboxing it. The only way it can affect the outside world is by dropping files into the
artifacts/
directory, which will be zipped up and passed to the second job. The second job then can take those files, and do whatever it wants with them.For example, the first job might clone the given repo, check out the given revision, do
python setup.py sdist bdist_wheel
, then put the resulting sdist and wheel into theartifacts/
directory. The second job can then take the sdist and wheel files, and upload them to PyPI. If someone puts some nasty code insetup.py
, it could create an arbitrarily broken sdist/wheel... but that's about all it could do.The script that runs inside the jobs can see the information about the
repository_dispatch
event that triggered it. This includes the arbitrary json payload, which it can use to decide what to do. But before it does that, it should first check that thesender
field in the event shows it coming fromtrio-bot
. Any app with read-only access to the repo can create arepository_dispatch
event, and who knows what kind of shenanigans that could enable, so better to make sure the message really comes fromtrio-bot
up front.When the jobs start, Github sends out a
check_run
event to any webhook listeners, and this includes both the "name" of the job, plus a bunch of metadata about the job. By using clever template-y magic, we interpolate theid
field from therepository_dispatch
payload into the jobname
. This way, our Github app can listen for thecheck_run
event, and match it up with therepository_dispatch
it sent. This way it can do stuff like post back to the original user who triggered the operation "OK, that's started, if you want to watch the progress then click here: [URL]", and if it's running multiple operations at the same time it can keep track of which one is which, and eventually get the exit status to figure out whether it succeeded or failed.So... yeah. It's kind of elaborate, but I think it all does work, gives solid, easy-to-reason-about security guarantees, and is way easier than building our own sandboxing mechanism.