Morganamilo commented 3 months ago

Patch release Design

This issue outlines the current and future plans for the patch release system.

Patch releases are done by merging code changes into one of the old release branches, bumping the crate versions, then publishing them. The publish step is the same parity-publish apply as the main release. The only new step here is version bumping.

The version bumps are performed with the following command:

parity-publish plan --patch <crates>

The only thing this command does is increment the patch version of the specified crates.

This approach is somewhat limited and tedious as it relies on the backporter to correctly specify what crates have been changed and that none of these changes are breaking.

Detecting changed crates

Instead of relying on the backporter to provide a list of changes, parity publish can detect this itself using the following command:

parity-publish changed -qd $COMMIT | parity-publish plan --patch -

This pipes a list of crates changed since $COMMIT into the plan patcher eliminating the need for a manually reasoned list but does not improve on much else.

PR Doc

With the introduction of PR Doc, the list of what crates changed and semver kind can be programmatically specified. The command to use this information is as follows:

parity-publish prdoc prdoc/pr_$NUMBER.prdoc -qd | parity-publish plan --patch -

This command only semi integrates PR Doc support. It pipes the list mentioned crates into the plan patcher, ignoring the kind of change specified and treating them all as patch releases.

This wasn't too useful at first as PR Docs were often not filled out at all, or the crate information would be incorrect. Now that we have CI that ensures PR Docs contain correct information it's the primary method to patch releases.

However this method still has limitations.

The prdoc number must be known to run this command
Every kind of change becomes patch level (this isn't too limiting as patches should only be patch level anyway)
This works one PR at a time

These limitations essentially amount to a lack of automatiability and more time spent performing the patches.

Full PR Doc Integration

The plan for Full PR Doc integration is to remove the distinction from initial release and patch releases.

Currently the plan command is only used to generate new Plan.toml files. However it can also work on existing plan files, updating them.

When a new release is generated via PR Doc we use the command:

parity-publish plan --new --prdoc prdoc/

This generates a new plan using all the prdocs in the prdoc/ directory. We then move the prdocs from prdoc/ to prdoc/vX.Y.Z. Note that --new isn't strictly needed here. All it does is force parity publish to create the Plan.toml from scratch instead of updating an existing one. But if there is no existing Plan.toml, --new does not make a difference.

This effectively mergers the initial release and patch release processes into one. The only difference being does a Plan.toml file already exist? In practise this difference makes the choice: "do we take the current newest version of the crate and increment from there?", or "do we increment from the version locked into the plan file?".

In this system, prdoc/ acts as a staging area for both kinds of releases and lets us perform multiple patches atomicly and with respect to the total dependency order.

This system also allows for full automation of both process, both using the same script.

Support for breaking changes

There is no technical reason the current process does not support breaking changes. The only reason is because the intent to make a breaking change can not be expressed via the parity-publish plan --patch command. Moving to the new PR Doc system lifts this restriction but begs the question "Do we want to allow breaking changes?".

The issue with breaking changes in the current system is that they get released as patches. This is obviously bad and I won't go into detail why here. With the new system this is technically fine but remains bad at a UX and support level.

The semantics of minor/patch only updates

Lets define the polkadot release v1.5.0, containing the crates a, b and c. Keep in mind a may depend on b, b may depend on c and so on. The exact relationships are not important.

Minor and patch updates are always forward compatible and cargo understands this, so if we were to release a-1.1.0, this version supersedes and completely replaces the old version. This means when we update a we do not need to tell anything that depends on it to switch version it just happens automatically. This applies to both, crates inside polkadot-sdk, and any external projects that use a.

For the users this means they pick a defined polkadot version and get automatic 0 thought updates until that version is defined EOL at which point they choose to explicitly update to a new polkadot version. This is a manual process, both in the sense that they must explicitly tell cargo "use these new major crate versions", as well us updating their existing code to account for the polkadot code has change. The former is automatable but the latter must be done by hand.

The semantics of allowing major updates

Before we could reason that a-1.x.y always belongs to a set of polkadot releases. Now we may have:

polkadot v1.5.0 release with a-1.0.0
polkadot v1.6.0 release with a-2.0.0
polkadot v1.5.0 makes a breaking change to a thus releasing a-3.0.0

While there's nothing wrong with this on a technical level it removes the clarity that a-1.x.y will always be the version a user users until they manually update. Now versions may change unexpectedly and end up higher than the versions in newer releases.

The real big issue is that major updates are manual. To stay up to date on the same polkadot release, users must not only manually update the dependencies in the Cargo.toml (manual here means the user must explicitly choose to do this. Not that a script couldn't do it). They must also update their code to match the new changes.

How often should users do this update? Should users strive to upgrade as soon as we push something new leading to an endless game of cat and mouse? Or should users choose to update on a defined schedule possibly missing out on bug fixes and security patches in the meantime?

This issue is exacerbated by the fact that a major update requires all the reverse dependencies also be major upgraded recursively.

We're effectively just following our major release structure but made it 2D. Our major releases have a second level to them where they also get sub major releases. We could flatten this back out and just run our normal major releases with theoretically no difference.

The theoretically is doing some heavy lifting here to be fair. While it would remove any semantic difference, the real world difference would be volume. The amount of difference between each 'sub major release' would be less than each main release. The question here is how different? How do we define how often breaking changes go to each release? Do we have criteria/policy to decide this?

Compromise

In an ideal world we would adhere to this strict formula of major release and minor patches. However polkadot is a large and complex project where semver is not thought about at the offset. In theory polkadot could strive to confirm with a top down effort and sweeping changes to its philosophy and design.

But even if that were to happen it would be a long road so what do we now?

My proposal is a compromise of the two. We allow breaking changes where needed but heavily try to minimise this. This can only be achieved with actual definitions and policy.

What is the scope of a polkadot release?

Is this for security and bug fixes or feature updates? If we allow features how do we define what features are suitable for backport?

We define crates as core, non core

We should designate certain mature low level crates as core and they should commit to stability.

This allows us to tackle the commit to stability piece by piece. Staring from the mature fundamentals. Keep in mind major changes propagate to dependants recursively casting a much wider net.

ggwpez commented 3 months ago

The real big issue is that major updates are manual. To stay up to date on the same polkadot release, users must not only manually update the dependencies in the Cargo.toml (manual here means the user must explicitly choose to do this. Not that a script couldn't do it). They must also update their code to match the new changes.

I think we should not allow major updates of crates within the same SDK release cycle. It just breaks everything that is nice about SemVer and Cargo.

Using the umbrella crate would make this easier either way, since the user does only have to select the SDK version and no individual crates versions.

The amount of difference between each 'sub major release' would be less than each main release. The question here is how different? How do we define how often breaking changes go to each release? Do we have criteria/policy to decide this?

I dont quite follow this. These SDK release numbers like 1.5 and 1.6 are completely random and have no SemVer implications. As argued here, we should name the releases stableYYMMDD and maintain that for 3 months with only non-breaking patch/minors.

We should designate certain mature low level crates as core and they should commit to stability.

I think this could be done with pre-1.0 versions?

Morganamilo commented 3 months ago

I think we should not allow major updates of crates within the same SDK release cycle. It just breaks everything that is nice about SemVer and Cargo.

I agree with the principle but just saying no breaking changes wont work. This is the reason for the compromise, at least for now. If we get to a point where we can remove breaking changes we can then disallow them.

I dont quite follow this. These SDK release numbers like 1.5 and 1.6 are completely random and have no SemVer implications.

I'm just arguing that by allowing unchecked breaking changes then there's not really a tangible difference between main releases and backports.

ggwpez commented 3 months ago

I agree with the principle but just saying no breaking changes wont work. This is the reason for the compromise, at least for now. If we get to a point where we can remove breaking changes we can then disallow them.

I will personally try to discourage any attempted breaking change to a stable release that i see. Otherwise the builders will be unhappy again 🫠

I'm just arguing that by allowing unchecked breaking changes then there's not really a tangible difference between main releases and backports.

👍

paritytech / parity-publish