Closed jku closed 2 months ago
- It's a bit annoying how large the patch makes publish.yml
- it seems difficult to avoid duplicating the setup code for the two stages
- the second stage cannot be put in a reusable workflow (this is a GH limitation relating to environments): this is why I removed the separate reusable workflow and put everything in publish.yml
I suppose a way to keep publish.yml changes to a minimum is:
deploy-gcs is an internal composite action that accepts an argument for either a "timestamp" or "full" deploy
gcloud storage
crimes are hidden in deploy-gcsEDIT: I've implemented this approach in https://github.com/jku/tuf-on-ci-sigstore-test/blob/main/.github/workflows/publish.yml and I'm not convinced: it's still a bit complicated and now we have to manage the composite action calling and the input handling
The motivation behind different release environments for production was occasionally we would mess up metadata and only catch it once we manually called cosign initialize
. The addition of smoke tests against both cosign and another representative sigstore client should catch these issues before being pushed to production by testing against the PR (or is it the main
branch`?), correct?
Rebased on main (that now includes the GCS tests even if currently broken).
The motivation behind different release environments for production was occasionally we would mess up metadata and only catch it once we manually called cosign initialize. The addition of smoke tests against both cosign and another representative sigstore client should catch these issues before being pushed to production by testing against the PR (or is it the main branch`?), correct?
All testing happens against the Pages-published repository but otherwise correct. Somehow the flow is difficult to draw in a chart but I tried (with merges included). The main point is that
So "a publish step" needs to happen before testing. The flow looks like this (once the hopefully minor issues with GCS tests are ironed out):
graph TD;
merge{"signing event<br/>(merges to 'main')"}-->online-sign;
online-sign-period[online role in signing period]-->online-sign["online signing<br/>(merges to 'main' and 'publish')"];
online-sign-->publish-pages[publish to Pages];
publish-pages-->test-pages[test Pages with clients];
test-pages--this is the critical point where we can add manual review-->publish-gcs[publish to GCS];
publish-gcs-->test-gcs[test GCS with clients];
The takeaways on branches and deployments are
main
and publish
branches as well as pages deployment are done before testsSo the question is this: will maintainers do additional manual testing using the Pages-published repository before publish to GCS
if we give them the chance? I think the answer is potentially yes considering how rare actual big changes are.
After this PR the flow looks like this (rhombus used to signify human interaction):
graph TD;
merge{"signing event<br/>(merges to 'main')"}-->online-sign;
online-sign-period[online role in signing period]-->online-sign["online signing<br/>(merges to 'main' and 'publish')"];
online-sign-->publish-pages[publish to Pages];
publish-pages-->test-pages[test Pages with clients];
test-pages--if only timestamp changes-->publish-gcs-light[publish timestamp to GCS];
test-pages--if additional metadata or targets changes-->deployment{Deployment review<br/>or delay};
deployment-->publish-gcs-full[publish full repository to GCS];
publish-gcs-light-->test-gcs[test GCS with clients];
publish-gcs-full-->test-gcs[test GCS with clients];
I'll close this: let's reopen if we want manual deployment reviews
This fixes #54 by using release environments GCS deployment. It is a draft for a few reasons:
sigstore/github-sync
does not support environments so would have to add that firstDescription of changes
GCS publish now has two stages:
deploy-to-gcs-light
uploads timestamp onlydeploy-to-gcs-full
is executed: this uploads all changesdeploy-to-gcs-full is gated behind a GitHub release environment that can define
It's maybe noteworthy that this decision of which stages are needed is based on the actual changes happening to the bucket: the event that triggers this publish does not matter. So in practice a simple online-sign could result in
deploy-to-gcs-full
if e.g. previous publish failed and there are actually more changes being uploaded than timestamp.Publish now has a concurrency group: Since release reviews and release delays can mean a release does not happen before a new is done, it makes sense to cancel in progress publishes: the newest one should be used.