RFC: [META] Improve our release process

ScopeyNZ commented 5 years ago

Problem

Recently releases that we have been putting together for core have not been very easy. There's a couple of things that we come across that makes it hard:

When security patches are involved the release process becomes a lot harder

Security patches means that RCs cannot be public - stable release must be the point of disclosure. Quoting @unclecheese:

This complicates things. We have to release from the security fork and that involves setting up a satis server and arm wrestling with cow a bit. It doesn't set well with me when releases are so non-deterministic, so to speak, and require a lot of thinking and different paths to take depending on varying conditions. It seems very error prone, and hitting the big red button lacks the confidence and predictability it should have.

It becomes difficult to isolate critical fixes from general fixes on a bugfix branch

This was made abundantly clear on the recent 4.3.0 release. The RC was prepared and then tested - finding a few critical issues. Although you could argue that testing should be done before an RC it doesn't detract the fact that critical issues can be found after an RC is created. Over the ~4 weeks it takes for an RC to go through a security audit (if applicable) and further user testing we fix these issues but we also still accept contributions and other minor fixes into the 4.3 branch.

The issue arises when we decide what exactly we want to apply to the actual 4.3.0 stable release and what can wait for 4.3.1. Again, you could argue that all bugfixes applied between RC and stable release should go into the stable but that will mean all these fixes go un-audited in any security audit (unless there's a follow up audit). In the end it's practical to have the ability to "pick and choose" so to speak.

Current state

This is from what I know that might not be 100% - I'm not incredibly intimate with this process yet. The release process is a long (4-6 weeks!) process that involves one (or more) RC(s) that are usually security audited when bundled with a CWP release

Currently when it's time to begin the release process we will:

Release an RC1:
- We use cow to make a plan and generate a changelog.
- In core, all modules are tagged with an RC1 suffix.
- In CWP (ie. all other supported modules) we will release at this point unless we expect security patches or critical hotfixes (rare)
Perform sanity checks
If there are security issues to be audited we pull those in manually to a zip into package up for auditing
Ship the RC for auditing
When critical bugs arise we fix them on the minor branch (ie. 4.3 for 4.3.0-rc1)
When non critical bugs arise it's the same process - target the minor branch
When security issues are raised:
- We prep a patch in a private fork of the affected repo
- The patch is discussed, reviewed and approved but cannot be merged
- The PR sits open until stable release time
When it's time to release we must retrospectively identify issues that should be included into stable (that were merged into minor release branches)
We create a "release branch" from the commit that RC1 was tagged at
We cherry pick fixes into that release branch
We merge in security fixes from the private repo
We release with cow using that release branch
We have to cherry pick or merge back the security fix after the release into the patch branches

Proposed state

@sminnee @unclecheese @chillu and I took a moment to sit down and work through this in a whiteboard session. Originally the core contributors were discussing having a private "releases" fork of all our modules to assist with releases. The initial concern is we'll be overcomplicating the process because a third repo is involved. Our discussion concluded that we should not create new forks on a new repo but instead use the existing forks on our current private repo.

"It's time to start a release":

Create release branches on our private repo for the modules we will be tagging with RC1 (eg. 4.3.0-release) - preferably have a tool to automate this (cow)
Create a Satis configuration to point to this repo with the packages that have been branched. Satis can be configured to only provide some of the packages from a repo. - This could also be part of the previous step if automated
Tag an RC1 tag on the private repo, sanity check, and push to origin (release publicly) always.
When critical bugs arise those are fixed in the public minor branch (ie. 4.3 for 4.3.0-rc1) - same as before
- A release manager may then cherry pick this into the private release branch if this needs to be released in the stable release
Minor bugfixes are the same as before - they target the minor release branch.
If there are security PRs in flight we can re-target them to our release branch as they already exist on our private repo. These PRs are merged when ready
Release an RC2 if applicable. This and subsequent RCs should be made public so long as there are no security patches included.
Release an RC before auditing and ship that to the auditor
Continue with bugfixes - we can now get a solid list of changes that are not audited with something like git log 4.3.0-rc2..HEAD because we're providing a tag (and nothing additional) to the auditor
Responses from the auditor are actioned and merged into the release branch
Tag stable from the release branch and push the stable tag to origin
Merge the release branch back into the minor branch
Delete the release branch - publish any previously private RCs (if applicable)

How is this different

Some major advantages from my point of view:

We always have a known state of the release - just look at the release branches
We are removing effort at stable release time and spreading it through the whole process (= less stress at stable release time)
We're relying on having a solid release process because we're tagging RCs more often
We can safely "roll-back" a tag because they will be created on a private repo and then pushed to origin

And disadvantages:

Cow updates
Satis is always part of our release process

Please let me know @chillu, @unclecheese, and @sminnee if there's something you understood differently from our chat - or anyone else if there's something I've gotten confused about.

It would be great if @dhensby, @robbieaverill, and @tractorcow had any feedback, suggestions, or advice as you are all quite familiar with the release process!

robbieaverill commented 5 years ago

Release an RC2 if applicable. This and subsequent RCs should be made public so long as there are no security patches included.

Yeah - we don't include security fixes in RC releases anyway, only stable.

I think this approach is probably fine, provided cow can automate all of the annoying extra bits we're adding (patch release branching, satis configuration changes, potentially even listing commits to cherry-pick from upstream into the release branch etc).

Another option in my mind is that core releases are tagged as stable when we think they're stable, we include a stable release in the CWP audit releases (rc1) and then any security issues arising are fixed, revalidated and released as a patch release. This aligns closely with what we've been trying to stick to for a while, which is "CWP takes whichever is the current stable core release" and any issues found during its audit are handled just as any other vendor's private audits would be - reported, fixed, and released as patch releases. Benefits of this would be that the release process we should be following for CWP does not change at all, no changes to cow are required and we get those low impact bug fixes in for example the 4.3 branch out faster. Drawbacks are that we're potentially releasing new stable minor versions that might be implemented by users immediately, only to release a potentially critical patch release a couple of weeks later.

chillu commented 5 years ago

So, spanner in the works: Github wants money for private repos. They're free for up to 3 collaborators, which won't work for us. We'd be going from 20 -> 125 repos, and $50 -> $200 per month. That’s quite a lot for some release management consistency.

ScopeyNZ commented 5 years ago

This does make repo management difficult. We don't need repos for every module - in CWP we usually just release modules at RC stage. The difference is that we will still need private repos for any module that may have a security issue. This means manually sorting out private repos in those cases.

we don't include security fixes in RC releases anyway, only stable

To be clear this flow suggests changing this. RCs (after RC1) may contain security fixes. Whether they do or not will indicate whether they're made public.

Another option in my mind is that core releases are tagged as stable when we think they're stable, we include a stable release in the CWP audit releases (rc1) and then any security issues arising are fixed, revalidated and released as a patch release.

I am still also a fan of this approach, but it hasn't seemed to gain much traction. It'll be nice to outline why this is but I'm not sure.

robbieaverill commented 5 years ago

I am a fan of the idea of making security and regular, silverstripe-installer and cwp-installer releases follow the same release process every time - at a high level

chillu commented 5 years ago

Another option in my mind is that core releases are tagged as stable when we think they're stable, we include a stable release in the CWP audit releases (rc1) and then any security issues arising are fixed, revalidated and released as a patch release.

I don't think we've explicitly discarded this option in our whiteboarding session. As you say, it means we're more likely to perform two patch releases in quick succession if there's pending security issues. But more patch releases is a good thing, and helps get non-security bugfixes out faster.

Public RCs are most useful on minor releases, not patch releases. They allow more disruptive changes to bed in before we call that release line stable, with the assumption that there's early adopters who are willing to help us test in RC stage, but aren't willing to run dev release branches. I think that subset is small enough to be ignored, most early adopters are happy running dev release branches. The situation changes once we prepare a new major release (5.0.0), but let's not worry about that right now.

Private RCs are just a workaround for us, because we need to declare what we've sent to the auditor. In case that audited set contains undisclosed security issues, we can't point to the public tag. We could point to a set of SHAs on private release branches, but that's hard to handle (e.g. through composer install).

Since we're locking versions across core modules, we can't just push a public cms@4.3.1 because it doensn't contain an undisclosed security issue. It relies on a private framework@4.3.1 with an undisclosed security issue. In this situation, a community member runningn composer install would get cms@4.3.1 and framework@4.3.0. Technically, changes in patch releases shouldn't rely on modified or added APIs in other modules, so this should work. But practically speaking, that's asking for trouble. We need to discuss how much we want to decouple version locking between modules, but I'm wary of making that a precondition for this release management change.

Can someone clarify in which of these proposed cases we need more than 20 repos on github-security? That's the main point of contention for me at the moment (💰). Could we cover this by making the first step in the release process "identify which repos have pending security issues, then point satis at those repos only"? Or would that be too much inconsistency @unclecheese? That could also save us a lot of cow automation, because typically you have pending security issues on one or two repos, which we could handled with a well documented manual process. Any process we need to run for ~15 core repos (or ~90 supported modules) needs to be automated.

unclecheese commented 5 years ago

Having done a couple of these new "private" releases, my perception is that for core releases, the private repos need to be all or nothing. We can't just whitelist a few affected repos to be behind the curtain for a a given release.

The reason for that is that core releases tag everything, regardless of whether it contains new commits since the previous release. This is done to keep semantic versioning parity across the whole release. If we were to only put some modules in private, we'd end up with public tags for asset-admin@1.3.4 when silverstripe/installer@4.3.4 doesn't even exist.

Two ways around that:

We release more like CWP and stop tightly coupling the version numbers
We stop caring if there's a new release floating around publicly that isn't yet tied to a stable recipe.

Assuming those things remain unchanged, we need to have > 10 repos in the private forks, which pushes us into a higher priced plan with Github.

Other options:

We use the security repos on an as-needed basis (i.e. has a pending security PR), and create a new workspace for silverstripe-releases that is all inclusive. Could beat the USD $200 fee by having only one contributor (cow).
We move the private repos to something other than Github, like Gitlab, or Bitbutcket, which may have cheaper plans.

I can say that it's really refreshing to work with an entirely private release, and not have to worry about what repo pairs with which remote. It feels much more automation friendly and less error prone.

I wonder how this could be baked into cow, as well, so we don't have to manually remind it to publish to a custom repo.

$ cow release:create 4.3.2 silverstripe/installer
$ cow release:stage https://security-repo.git http://my-satis.local
$ cow release:publish <-- assumes stage
$ cow release:publish --public

But that's probably a separate PR/RFC for cow.

chillu commented 5 years ago

If we were to only put some modules in private, we'd end up with public tags for asset-admin@1.3.4 when silverstripe/installer@4.3.4 doesn't even exist.

That's not entirely accurate. We'd end up with public tags for asset-admin@1.3.4-rc1 and private tags for silverstripe/installer@4.3.4-rc1. For the final release, both of them are public. This mix means that anyone pinning to @rc dependencies would run a mixed set of patch releases - which should really work. We could reduce the potential impact of this by mandating that any cross-module bugfixes need to go into the next minor release branch. I think people either pin to @stable or @dev anyway, so I'm not too worried about that edge case. If we accept that, would it make it easier?

Could beat the USD $200 fee by having only one contributor (cow).

I'm not a fan of systems which break our audit trail. cow creates commits (e.g. merge commits) and tags with the user's Github account. If we make this a generic, shared account, you can't tell who's performed those actions. And shared accounts with this level of access are too much of a single point of failure. We mandate MFA, which would get difficult with a shared account (although we could pass around a Yubikey).

We move the private repos to something other than Github, like Gitlab, or Bitbutcket, which may have cheaper plans.

I'm veto'ing that, we already have enough trouble with user management on the various Github organisations (and potential tooling around that).

@robbieaverill @ScopeyNZ If we went for the $100/month plan and 50 repos, we'd cover core, and you'd still have a few repos spare in case security issues in supported modules come up. Does that complicate your cwp release process? Assuming that CWP pre-releases often need to include unreleased core versions with security fixes, this means you need to work with Satis as part of the release process as well. It sounds like that's acceptable? I think increasing our release management costs by $50USDx12 per year is a bit more palatable than an increase of $150USDx12 per year.

robbieaverill commented 5 years ago

We stop caring if there's a new release floating around publicly that isn't yet tied to a stable recipe.

There's no problem with doing this as long as they're not on recipes - they wouldn't be installable without all of their child dependencies having public tags. Modules like assets for example could be tagged individually and it wouldn't really matter.

We release more like CWP and stop tightly coupling the version numbers

+1 for this too. There's really no reason to do this other than historic policy and it being easy to work out which version goes with which version. It's the recipes that matter, not the module versions.

In some cases someone might be running assets 1.2.3 against framework 4.2.2, which may have a fix for a bug in assets and not yet have a corresponding framework patch - that's really the only downside I can think of though.

@robbieaverill @ScopeyNZ If we went for the $100/month plan and 50 repos, we'd cover core, and you'd still have a few repos spare in case security issues in supported modules come up. Does that complicate your cwp release process?

We typically don't need that many private repositories for CWP releases since (as @unclecheese noted) we don't tag everything every time we do a release, only what needs to be released. There are some parts which usually do get tagged together though: kitchen sink, installer, cwp-recipe-cms, cwp-recipe-core, cwp and cwp-core. If any of these recipes/modules have dependencies (like framework) that require a security release then we'd need a private repository for it.

If we're going to go down the road suggested in this issue and make every release process the same as a security release, I think we'd probably need around 6-8 private repositories for our side. Most of our recipes and modules get tagged individually whenever a worthy bugfix gets merged anyway.

Assuming that CWP pre-releases often need to include unreleased core versions with security fixes, this means you need to work with Satis as part of the release process as well. It sounds like that's acceptable?

Yep, no problem as long as it's standardised.

robbieaverill commented 5 years ago

@chillu Since we're locking versions across core modules, we can't just push a public cms@4.3.1 because it doensn't contain an undisclosed security issue. It relies on a private framework@4.3.1 with an undisclosed security issue.

This is not entirely true, the modules never have tight dependency constraints, only recipes do

unclecheese commented 5 years ago

Thinking about this more, with a bit of a post-mortem on patch releases. The satis server approach showed a number of limitations and pitfalls:

It relies on a lot of manual syncing. First, you have to sync upstream with security, which is a tedious process, especially when the security branches may have merged PRs that you don't want to overwrite.
You have to rebuild the satis server frequently, depending on many mistakes you make along the way. This is a long-running task, and it isn't always obvious that you have to do so.
Installations from the satis server fail silently with a fallback on packagist, which means that your cow processes are running with the public repos as their upstream, unbeknownst to you. This resulted in the publication of tags on the public repos that had to be immediately deleted.

My solution for the next set of releases was a much more manual approach:

Edit the remotes of each cloned repo and set them to silverstripe-security.
Force-push the branch to ensure the private repo was up to date
Cherry-pick the commits off any security PRs.

This worked much better, but again, there are massive windows for error at every step along the way, and I benefitted tremendously from some one-off automation scripts.

Chatted to both @ScopeyNZ and had the following thoughts:

A disposable silverstripe-releases workspace

tl;dr As much as I think this would benefit us, I don't think this is going to work out.

I know I keep coming back to this. It just feels really bullet-proof to me. It really comes down to the issue of managing state. Syncing is a pain, and the easiest antidote to syncing is an architecture that allows for short-lived, disposable state. This would be a space with many repositories and zero pull requests and issues. A cow command would strip them bare on release:stage (TBD), and rebuild them, pushing up the appropriate branches and tags. Security issues would live on the security repo, which would host all of the discussions and PRs to keep an audit trail, and commits would be cherry-picked across.

Given that only a few repos at a time are affected by security patches, the security workspace could be dialled back to its old price point. The releases workspace, however, would need more money thrown at it, and for that reason, it seems like a non-starter. Guy and I discussed the possibility of hoisting up a git repo on an internal server, but that seems antithetical to the spirit of open source and inviting collaboration.

Later chatted to @robbieaverill and had the following thoughts:

Remove the rigidity around early tagging and version-syncing

As discussed above, unlike the CWP recipes, the core recipes tag each module regardless of whether there are changes or not. This is basically an artefact of a time when the CMS was much more tightly coupled. We're past that now, and if you look at packages like silverstripe-graphql, silverstripe-config, and versioned-admin, we're already breaking the boundaries of those conventions, and it maybe time to just officially move on.

The idea would be:

If a module has new public tag ahead of a recipe that uses it, that's okay. It's a pre-release tag. While it shouldn't be considered a "staged" tag like in the security repos, it does mean that the costs of stuffing it up are much less, because it only affects users on @dev releases. If anything goes horribly wrong, we could just issue another tag, and we wouldn't have to worry about getting a whole new recipe together.
Only recipes and repos with security patches coming are released in darkness for internal preview. This would limit the number of private repos significantly, and reduce our Github subscription cost.
If a release is an incremental patch only (e.g. previous tag + 1 commit), it doesn't matter if other modules have newer tags. We can still pin the recipe to earlier tags, and allow the newer tags to be available in the next recipe release.
If a module has no new commits since the last recipe release, don't tag it!
New cow command would let you opt into privacy: cow release:stage silverstripe/framework my-security-repo

This process is much more similar to the CWP release process and would bring some consistency across the teams, and it would save us some money to boot.

ScopeyNZ commented 5 years ago

Remove the rigidity around early tagging and version-syncing

I think it probably makes sense to at least keep framework, cms and admin in lockstep for minor versions - although for admin obviously three majors behind. But any other repo I think it's a great solution.

Maybe we could work cow into a state where we flag a security release. cow release:security silverstripe/framework --org silverstripe-security. This would change the remote of that, and it's dependency tree, to silverstripe-security (or whatever). So for that command, silverstripe/framework, silverstripe/recipe-core, and silverstripe/installer would have updated remotes. cow would need to go and update forks at that point too, but you already have an appropriate base branch to do so. Basically pseudocode of cow:

repos = getDependancies('silverstripe/framework')
foreach repos as repo:
  cd repo
  git pull
  git remote add security 'silverstripe/security'
  git push security --delete {current-branch}
  git push -u security {current-branch}
  git remote remove origin // Safety

At this point you can still release:publish and you'd get public tags of everything except the repos flagged as security. I think that's fine though...

maxime-rainville commented 5 years ago

If we're going to break the lockstep in our releases, testing the lowest dependency set probably becomes more important.

https://github.com/silverstripe/silverstripe-framework/issues/8990

chillu commented 4 years ago

Separated out some billing concerns into an internal discussion at https://github.com/silverstripeltd/product-issues/issues/122

chillu commented 4 years ago

Since we last discussed this, I think a few things changed for the better:

The Github pricing model is based on users, with unlimited private repos (see announcement in January). My earlier comment referred to the legacy plans only.
Github has introduced Security Advisories with free private forks per advisory - I don't think that'll work for us though
We now have a clear policy for which security issues need to be disclosed through a quarterly recipe release (or a hotfix recipe release): Anything with CVSS 9.0 or higher. Anything else can be released separately (either as an individual module, or a core recipe release), to be bundled into a quarterly recipe release later (and audited at that stage).
We have broken up lockstep releases in core on a patch level, but retained it on a minor level. This should reduce release complexity here.
Github

maxime-rainville commented 4 years ago

FYI We've been doing some related work related to this issue on a private issue on the CMS Squad board. We've put out some info about how we'll approach releases on the forum.

chillu commented 4 years ago

Can anyone break down specific next steps to close this RFC? @dnsl48 @maxime-rainville Are we OK now in terms of how we handle security releases on private repos? Is is documented, understood and possibly automated enough that we can perform these releases while minimising human error? Do we have sufficient Github private repo allocation without driving up costs? I'm assuming we've come up with a process that does not rely on every recipe repo running on a private fork for a security release candidate, otherwise I would've heard about it because we would've run out of Github private repos ;)

maxime-rainville commented 4 years ago

What I ended up doing for the 4.4.5/4.5.1 release was merged the security patches locally. That was suggested as simpler than setting up a satis server and appropriate when doing a simple patch release.

To be honest, I don't think we have this process fully nailed down. The next release will have a lot more security patches and will probably be a minor. So it's going to be a lot more involved.

dnsl48 commented 4 years ago

Yes, I agree, we don't handle security releases well enough. It's too human-error prone and I'm hoping to get some more work done around Satis workflow so we can polish the process. Two last security releases we didn't do satis thing. Currently I'm considering to include Satis into our dockerized Cow environment to automate installation part and make it smoother and easier to work with. However, that needs a bit more thinking yet.

chillu commented 4 years ago

OK @brynwhyman this might be an issue to clarify before the next minor release, how it relates to the team's backlog. We've closed out the "improve release process" epic, but this was actually the first big issue which triggered this epic in the first place back in the days heh.

brynwhyman commented 4 years ago

Yeah, I think opting to include more work in this area to the late 'Product Security Tooling and Practices' epic may have kicked this can too far down the road.

Still, there's been a lot of new process change that's been documented but not necessarily linked back to this issue. We'll regroup to pull together a succinct summary in this issue and see what actions are thought to be remaining - there's got to be some in-between actions before getting Satis in the dockerised Cow environment.

dnsl48 commented 4 years ago

I think we should consider that before the next Security release, not a minor. I don't think that needs to be done separately from the release epic itself though. Also, my main assumption is that performing the whole Satis routine would take about the same time as to do it in a reusable way (through adding to Cow). Otherwise, we could perhaps keep releasing without Satis (as we currently do) and with somewhat more manual and rigid approach. I guess we can review this RFC once more when it comes closer to making the next security release.

chillu commented 4 years ago

@brynwhyman @Cheddam Are you planning to include the three current security issues in the upcoming 4.7.0 release? This might be a good opportunity to revisit the security process. Which I'm also keen to sort out because we need to clarify billing with the github.com/silverstripe-security org (see https://github.com/silverstripeltd/product-issues/issues/122)

silverstripe / silverstripe-framework