sigstore / root-signing-staging

Staging TUF repository for Sigstore trust root
https://tuf-repo-cdn.sigstage.dev/
Apache License 2.0
3 stars 5 forks source link

Publish: Split GCS publish to two #68

Closed jku closed 2 months ago

jku commented 4 months ago

This fixes #54 by using release environments GCS deployment. It is a draft for a few reasons:

Description of changes

GCS publish now has two stages:

deploy-to-gcs-full is gated behind a GitHub release environment that can define

It's maybe noteworthy that this decision of which stages are needed is based on the actual changes happening to the bucket: the event that triggers this publish does not matter. So in practice a simple online-sign could result in deploy-to-gcs-full if e.g. previous publish failed and there are actually more changes being uploaded than timestamp.

Publish now has a concurrency group: Since release reviews and release delays can mean a release does not happen before a new is done, it makes sense to cancel in progress publishes: the newest one should be used.

jku commented 4 months ago
  • It's a bit annoying how large the patch makes publish.yml
    • it seems difficult to avoid duplicating the setup code for the two stages
    • the second stage cannot be put in a reusable workflow (this is a GH limitation relating to environments): this is why I removed the separate reusable workflow and put everything in publish.yml

I suppose a way to keep publish.yml changes to a minimum is:

haydentherapper commented 4 months ago

The motivation behind different release environments for production was occasionally we would mess up metadata and only catch it once we manually called cosign initialize. The addition of smoke tests against both cosign and another representative sigstore client should catch these issues before being pushed to production by testing against the PR (or is it the main branch`?), correct?

jku commented 4 months ago

Rebased on main (that now includes the GCS tests even if currently broken).

The motivation behind different release environments for production was occasionally we would mess up metadata and only catch it once we manually called cosign initialize. The addition of smoke tests against both cosign and another representative sigstore client should catch these issues before being pushed to production by testing against the PR (or is it the main branch`?), correct?

All testing happens against the Pages-published repository but otherwise correct. Somehow the flow is difficult to draw in a chart but I tried (with merges included). The main point is that

So "a publish step" needs to happen before testing. The flow looks like this (once the hopefully minor issues with GCS tests are ironed out):

graph TD;
    merge{"signing event<br/>(merges to 'main')"}-->online-sign;
    online-sign-period[online role in signing period]-->online-sign["online signing<br/>(merges to 'main' and 'publish')"];
    online-sign-->publish-pages[publish to Pages];
    publish-pages-->test-pages[test Pages with clients];
    test-pages--this is the critical point where we can add manual review-->publish-gcs[publish to GCS];
    publish-gcs-->test-gcs[test GCS with clients];    

The takeaways on branches and deployments are

So the question is this: will maintainers do additional manual testing using the Pages-published repository before publish to GCS if we give them the chance? I think the answer is potentially yes considering how rare actual big changes are.

jku commented 4 months ago

After this PR the flow looks like this (rhombus used to signify human interaction):

graph TD;
    merge{"signing event<br/>(merges to 'main')"}-->online-sign;
    online-sign-period[online role in signing period]-->online-sign["online signing<br/>(merges to 'main' and 'publish')"];
    online-sign-->publish-pages[publish to Pages];
    publish-pages-->test-pages[test Pages with clients];
    test-pages--if only timestamp changes-->publish-gcs-light[publish timestamp to GCS];
    test-pages--if additional metadata or targets changes-->deployment{Deployment review<br/>or delay};
    deployment-->publish-gcs-full[publish full repository to GCS];
    publish-gcs-light-->test-gcs[test GCS with clients];    
    publish-gcs-full-->test-gcs[test GCS with clients];    
jku commented 2 months ago

I'll close this: let's reopen if we want manual deployment reviews