Goals

Background

After Step 1, our security release process will be the following:

We make a patch
We sparsely test it
We send it to the trusted validator group
Once enough of these validators have deployed it, we publish the patch

This proposal focuses on the “distributing the patch” part (steps 3-4) of this process. The “testing” part (step 2) is orthogonal to this proposal, as whatever we choose to do, we need to test with the patch still being secret.

Why should NEAR One work on this

When we need to make a protocol upgrade for a security fix, validators who are not in the trusted group will be kicked out for one epoch, as they are not able to patch their binary ahead-of-time.

Also, if the security vulnerability can lead to bad things happening to validators (eg. DoS), then validators would be vulnerable to the attack, especially during the time between when we publish the patch and when they apply it.

What needs to be accomplished

We can publish binaries in a way that other validators can confirm we did not maliciously introduce vulnerabilities (or were hacked), and we commit to the binaries. Non-trusted validators can then use these binaries to not be kicked in the epoch following the security protocol upgrade, or to mitigate the attack while they don’t know of the patch yet.

In order to do that, we should be publishing reproducible builds.

The steps to do this would be

Make sure our builds are statically-linked and bit-for-bit reproducible (nix could probably help here? Though we’d probably also want to publish our GCP machine type and linux distribution / kernel version, just in case)
On every release, build once in CI and once at another place (eg. the release engineer’s laptop? another CI system? I think having at least one non-automated place would be better), and verify that the built binaries are the same
When we publish a new security release, instead of the TLP:RED phase, we:
- Publish a binary for non-trusted validators
- Send the patch to trusted validators, as in step 1
- Ask trusted validators to redo our build with the patch that they know of, and confirm to non-trusted validators that we did not lie with the build
- Trusted validators will run with the patch from step 1, non-trusted validators can choose to use the binary that we provide to them
- After the protocol version upgraded, we perform the rest of the step 1 process, publishing the patch to everyone as TLP:GREEN then TLP:CLEAR
- As soon as they get the patch, people can then check that our commitment was correct, ie. that with the patch applied, they can actually build the binary that we published
For the commitment step, we might want to have a private key used to sign builds; but seeing that people currently trust that the source code on github comes from us, we’re probably fine just having a git commit that adds the binary hash to the nearcore readme. That’d be way easier for us to handle than a long-lived private key, but we do need to check with validators that they’d be OK with it

Main use case

With this change, we could publish binary-only security releases, without validators having to trust us. It means we may also be able to restrict again the size of the trusted validator group, to make sure the people aware of security vulnerabilities are as few as possible before the network is patched.

Links to external documentations and discussions

Discussion has already started on the full design document.

Estimated effort

The estimated time is one quarter-person.

Assumptions

None

Pre-requisites

Security Release Process Step 1, which is much easier to do, should be performed first, in order to list the validators that will receive the patch and not only the binary.

Out of scope