Summary

Create a chapter in the book, which sets out the rules that must be followed to ensure that ongoing development is done in a backwards-compatible fashion from any given "stable" baseline release. These rules should enable the project to make a release that we can confidently deploy into long-term supported environments such as enterprise Linux distributions.

Background

Parsec is a long-running service that will be configured, deployed, upgraded and maintained over a long period of time. The environment will also have client applications that are calling it, and persistent state that is collected on storage. Components will be upgraded over time. For Parsec to be supportable in enterprise system, it must not be brittle against upgrades. If Parsec is upgraded and restarted, client applications should continue working, and all persistent state from prior versions should remain valid and usable.

What Is Needed?

Stability rules are primarily needed for the wire protocol and the service, since it is the Parsec service that will be packaged and deployed with Linux distributions. Client libraries will typically not be packaged as binaries, and will rather be consumed as source code dependencies, and their versioning systems will tend to be language-specific, with versioning rules (such as semantic versioning) being applied on a per-library basis. One grey area here is the parsec-tool, which, like the service, might be packaged as a binary, and might be upgraded separately from the service, while also being consumed by shell scripts on the system (which we don't want to break).

Ultimately, all of the rules will need to be enforced by CI tests so that PRs can automatically be checked, which would likely involve installing a baseline version, running a test suite, and then upgrading to the new version and running a regression suite of some kind. However, for now the scope is limited to capturing and documenting the rules. We can't enforce the rules until we have stated what they are.

Fortunately, much of Parsec has already been designed with backwards compatibility in mind, but some of these design choices need to be made explicit.

In no particular order, here's a non-exhaustive list of things that we may need to include. We can use comments on this issue to refine these and gather more ideas.

Wire protocol versioning. The wire protocol has a versioning mechanism that must be strictly honoured. Request/response header fields must not be modified or re-ordered in anyway. The last breaking change to this was in May 2020 when we reworked it to align the request/response formats to make the protocol more amenable to shared memory transports. While it is possible to change the protocol, we must use the versioning mechanism, and we must negotiate with the client.
API versioning. The API must strictly follow the open/closed principle. Contracts for published APIs must not be modified in a backwards-incompatible fashion. New opcodes may be added, but opcodes must never be removed. A provider does not need to support every opcode, however providers should not retract support for opcodes that were implemented in previous versions.
Persistent state. Any persistent state (such as key properties stored on disk) must remain valid. If any changes of format are introduced, the service must be able to read the old format and upgrade to the new format on the fly. If this is not possible, then the new format must be by some kind of opt-in mechanism. For example, it could be a new type of key identity manager that needs to be selected in the service config.
Configuration. Once a service has been deployed and configured, its configuration file (config.toml) must remain valid. If we add new features to the service, and those features require configuration entries, then either the configuration entries must be optional (with suitable in-code defaults or feature de-activation), or there must be an upgrade process that ensures new config entries are present without overwriting config that was already applied previously.

Client Libraries

As mentioned above, client libraries are subject to somewhat different rules because they typically would not be deployed as binary packages, hence API compatibility is more important than ABI. For the case of the Rust and Go clients, this is largely covered by semantic versioning, and it becomes an issue for the individual clients and the per-language ecosystems in which they operate. The parsec-book would not be expected to cover all of these, except perhaps with a few general notes.

One open question for the client side is to what extent they should be calling ListOpcodes to determine support for particular operations. As and when Parsec reaches some "first stable release" milestone, is that the point where we decide that the opcode set is fixed, and clients should never assume the existence of any newer opcodes without first querying dynamically?

I did a bit more research on this topic.

I propose to transform this issue into an investigation on how to reach and ensure stability of the Parsec service. As libraries, the Parsec clients are already following exisiting stability rules. The parsec-tool's stability can be investigated in a separate issue on that repo.

What does stability mean for the Parsec service?

I choose the following definition of stability for the Parsec service: whatever two versions A and B of the parsec binary, B being newer than A, A and B are stable if A can be removed and replaced by B without breaking anything.

Note that we are not looking at stability in the reverse direction: B can not be replaced by A without breaking anything.

Let's try to use the principle of semantic versioning to describe the stability of Parsec versions and when breaking changes are done. If parsec version is at 1.0.0 then all future stable version to that will be 1.x.y.

What needs to be stable?

With the above definition, Parsec stays stable if it keeps the exact same communication channels and format with its environment. Those are already described in the dataflow diagram in the threat model, reproduced below:

The communication channels that needs to be stable are (some of them are not in the diagram):

Communication with clients
Communication with Identity Providers
Communication received from the CLI invocation
Configuration file (including default configuration)
Key mappings
OS signals/systemd communication
Dynamic libraries dependencies

Maybe I forgot some other dependencies on the environment. If we are to write a chapter on the book explaining Parsec stability, this list should be written there and during PArsec development we should look for new dependencies introducing stability concerns.

Each point in detail

Let's look at each of the points above and see how stability is ensured there, what are the future problems, how it can be tested.

1: clients

The wire protocol used for communicating requests/responses is stable by design and versioned. The open/close principle is used. One thing to look at is if the introduction of new parameters (for example new algorithm or new key type) constitutes a breaking change or not. For example PSA Crypto 1.0.1 introduced new alg and key type. In protobuf, we encode those things using oneof fields so we need to check if adding a new field is a breaking change or not.

For testing we need to check that the same set of tests work for different versions of Parsec.

2: identity providers

Or authenticators in general.

Unix Peer Crendentials: working on top of C APIs which should remain stable as part of the C library.
JWT SVID: based on the SPIFFE Workload API which should be stable (used in rust-spiffe). If breaking changes appear in the Workload API and not all SPIFFE implementation versions support this, then this should be documented. something like "Parsec version x.y.z supports implementation of the workload API from versions a.b.c to a'.b'.c'". It should be considered a breaking change in Parsec to require a new version of an authenticator.

For testing we can check that different versions of Parsec work with the same authenticators (or different versions of authenticators).

3: CLI invocation

The CLI options used to invoke Parsec should remain stable. Currently there is only one parameter -c, --config to indicate the path to the configuration file.

We should test that this option is always available for future stable versions of Parsec.

4: Configuration file (and default)

The same configuration files should work for newer stable versions of Parsec. The default options in the configuration files should remain the same. If a new field/option is added in the configuration file, this field must be optional for stable versions.

To test, a set of different configuration files should be tested accross different Parsec versions to check that Parsec still spins up correctly.

5: Key mappings

Mappings between KeyTriple and KeyInfo are persistently stored somewhere. New stable Parsec versions should successfully load old mappings and store current ones (of the same KeyInfoManager).

For the OnDiskKeyInfoManager:

base64 encoding of application names are used as directory names
provider ID are used as directory names
base64 encoding of key names are used as file names
key ID type is serialised/deserialised by Serde
the Attributes field directly come from the psa-crypto crate, (de)serialised by Serde

Stability:

The application names should remain representable as UTF-8 strings. ListClients and DeleteClient are based on that. Stable.
Might become a problem if provider ID switch from static numbers (1: MbedCrypto, 2: PKCS 11, 3: TPM, etc) to dynamically allocated IDs (allowing external providers to be dynamically loaded, without prior Parsec knowledge). Providers UUID could be a stable provider reference (but could the same provider be loaded twice? For example two PKCS 11 providers pointing at different PKCS 11 libraries).
Stable as key names are UTF-8 strings.
If the key ID type is not stable accross Parsec versions, then it won't be possible to load old mappings.
If the Attributes structure is modified, then deserialisiation will fail for old mappings.

4 and 5. If Serde modifies the way it (de)serialise data, then old mappings won't be read as well.

parallaxsecond/parsec#271: if multiple authenticators are to be supported simultaneously, that will also probably impact the mappings, specially the OnDiskKeyInfoManager. The same application name should yield different keys under different authenticators.

Actions to take for those points and testing will be handled in another issue.

6: OS signals/systemd communication

Parsec currently answers to SIGTERM and SIGKILL. Those signals should still have the same behaviour on Parsec for future new versions. Parsec notifies systemd that it is ready/reloading/stopping using sd_notify. This behaviour should remain the same for future stable Parsec versions.

For testing, those signals and systemd integration should be exerced for different stable versions of Parsec.

7: Dynamic libraries dependencies

A new stable version of Parsec should not introduce a requirement on a new library dependency. The dynamic libraries dependencies should be listed (with their working versions). The only ones I can think of now are the TSS libraries.

Dynamically loaded libraries like PKCS 11 implementations should also be looked at.

For testing, different versions of Parsec should be checked with the same set of TSS (or others) libraries.

Actions to take

Agree on exactly what needs to be stable: are the 7 points above enough? Should they be more precise/fine-grained?
Add a new page in the book with the list (for starters).
For each of those point, ensure their stability if not already done. Mainly looking at "5: key mappings" for now. We need to create specific issues to address those points.
Add stability tests for each of those points. Some points might be tested together.
Document in the previous book page, how stability of each of those point is ensured and how it is tested.
Document in the book a process to follow to make sure that a new dependency on Parsec environment is added to the list when a PR is added, and is tested.
Document what happens for breaking changes. The whole page might only be relevent for one single major version of Parsec.

parallaxsecond / parsec-book

Create a chapter that defines the rules for stable releases of components #83