Proposal: Make the TUF server more useful for production deployment

Description

Hi folks, as a followup to the discussion that happened in https://github.com/sigstore/scaffolding/pull/1159, I wanted to make the following proposal to improve the TUF server. I would be curious what are your thoughts; I'd definitely love to work on this myself to push things forward.

Desired state

I would like to make the TUF server usable in production environments. My aim is to support both k8s and non-k8s environments (as noted in https://github.com/sigstore/scaffolding/pull/1159). To achieve that, there are several things that I think should happen when the TUF server starts:

Check if there is a complete TUF repository to serve. If so, it would just serve that.
If there is no TUF repository to serve, it would do what it does now - scaffold the TUF repository and start serving it.
- But it would also store all the generated private keys, so that the user can retrieve them and use them to continue operating the TUF repository outside of the TUF server.

This would mean we:

Have a TUF server that has a quick no-touch start (as it does right now).
Don't reset the TUF repository on a pod relocation.
Have a way for the user to grab the generated TUF repository and continue operating it (add more keys, etc).
Have a way for the user to pre-create the TUF repository and the TUF server would just serve it without creating anything itself.

Implementation details

The implementation would be very similar for both k8s and non-k8s environments. Roughly speaking:

Currently, the TUF server generates a TUF repository and stores it in a secret. We could have that secret mounted inside the TUF pod (if it exists) and serve the TUF root from there.
- Non-k8s case: assume the repository is in a directory on the filesystem and serve from there (again, if it exists).
If there is no TUF repository to serve:
- Generate a new one and store it in the secret just like we do now.
- Non-k8s case: store it in a directory on the filesystem.
- Also store the private keys in a different secret for the user to grab them.
- Non-k8s case: store them in a different directory on the filesystem.
Additionally, we could have a switch in the server executable which would specify whether the server should generate only the "new-style" trust root (trusted_root.json) or also include the current "old-style" TUF targets.

CC @haydentherapper @jku

I don't see anything unreasonable here. I will note that this sentence:

Have a way for the user to grab the generated TUF repository and continue operating it (add more keys, etc).

.. may include much more work than expected (or alternatively may provide a bad user experience).

For background on this, I'll list the high level options I see since sometimes they seem to be forgotten and only details are discussed (not saying this is the case here at all, just that this tends to happen):

we can improve the "off-band trust root" use case (i.e. no TUF in play at all):
- if clients have easy way that e.g. system admins can use to just insert trust roots onto devices with device management, then their problem is solved
- even with non-managed devices, if users are fine handling trust root installation (every once in a while) manually then they can use the same mechanism
We can easily make a set of files that looks like a TUF repository (and operates like one from client perspective) but never expires and is really a "write-once system":
- the disadvantage is that this only looks like a real TUF repository, any trust root changes would require the user to re-initialize the client with a completely new TUF root
- the advantage here is that clients will function 100% as they would with a real repository.
Roughly the suggestion here: Make a set of files that operate like a TUF repository with very long expiry times
- the difference to option 2 is that there is an "upgrade mechanism" to maintain that repository properly -- this would likely involve instructions to use a generic TUF metadata editing tool (like awslabs/tough tuftool)
- the expiry times are expected to be very long even when maintained
Make the "official" tuf-on-ci experience smoother so it's a viable option for more people (I say official but it is only used in root-signing-staging so far)

The reason I have not been advocating for solution 3 is that I have doubts of its reliability in practice:

we should expect that people maintaining these are not experts on TUF, and they'll likely be using the tools after not even thinking about TUF for 6 months or a year: even assuming they find the private key file and have stored it securely is a fairly high bar
running a complicated CLI tool to edit TUF metadata is tricky: it's easy to make mistakes and these tools tend to not protect the user from their mistakes

I'm sure it's possible to use a tool like that safely (I assume AWS uses tuftool to maintain bottlerocket TUF repo) but it may require more discipline and knowledge then we can assume.

I have one question to make the line between options 3 and 4 clearer, if you don't mind: I think the defining features of tuf-on-ci are

runs on GitHub
requires KMS for online signing
requires HW keys for human signers

Could you describe your use case more in these terms? Are all of the choices above issues for you or just some of them?

Thanks a lot for providing the context!

In terms of the options you outlined, I think you're absolutely right that most people aren't TUF experts and we definitely should aim for creating an initial TUF repository that expires in a very far future. However, I want to have the option for those people who do want to get familiar with TUF and who do want to manage the TUF repo themself - I totally agree that many won't but some may. So I want to kind of go for option 3, but with a huge disclaimer that people have to understand what they're doing, properly backup their TUF repository before making any changes etc.

In terms of tooling, you're absolutely right that tools like tough/tuftool allow you to break the TUF repository, for example tuftool will allow you to overwrite an old role file etc. I have plans to work with the maintainers of tuftool to set up these guardrails and provide experience which won't allow users to break their TUF repository (without using some sort of --force flag). I'm even thinking of doing a downstream redistribution of tuftool that would have Sigstore-specific builtin features to e.g. "add a new fulcio cert" with one command and do all the magic in the background.

To describe my use case in more detail: I'm an engineer working at Red Hat on a team called "Trusted Artifact Signer", which is, at its base, a downstream redistribution of Sigstore. We have only recently GAed and we don't really know how our customers use our product, but we have to provide something our users could manage the TUF root with. We're going for something very generic to allow them to tie in any usecase they might have. I really like tuf-on-ci, but it's currently limited in that it requires a specific kind of workflow that some organizations might not want to adopt. That is not to say that it can't be extended to other usecases, but right now it seems to me that a tool like tough could provide more versatility to tie in to any kind of workflow.

I think in the future, we might recommend tuf-on-ci as well as tuftool, if some of our customers are interested in this kind of workflow, but to be completely honest right now we just don't know.

Long term, I think the option 1 that you proposed is very interesting for us, because regardless of tools, TUF is hard and for some organizations might be an overkill. Once I have the feature proposed here implemented, I will definitely reach out and discuss other options - but again, I would like to gather our users' input and listen to their usecases before trying to lay out a proposal here.

Anyway, thanks a lot for your feedback, I will now start working on the proposed feature, time permitting :)

I agree that option 1 is what I'd recommend for those who don't need to deal with TUF environments. You'll see examples of providing a trust root in https://github.com/sigstore/sigstore-python?tab=readme-ov-file#configuring-a-custom-root-of-trust-byo-pki.

I do think a TUF environment is valuable for providing a mechanism to manage key material and handle rotation/revocation. It might be worth also exploring https://github.com/repository-service-tuf/repository-service-tuf as another alternative for customers - For those on CI, have them look into tuf-on-ci, for those who want a managed service, RSTUF.

All the PRs for implementing this have been merged. I'm closing this issue - thanks for all the reviews!

sigstore / scaffolding

Proposal: Make the TUF server more useful for production deployment #1182