oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
239 stars 36 forks source link

Define format for TUF master repo containing all system versions #2838

Open david-crespo opened 1 year ago

david-crespo commented 1 year ago

Edit: per Rain's comment below, there will be a second TUF repo format containing multiple system versions and the mapping from system versions to artifacts — this will not require modification of the "rack update" format used by Wicket.


At some point we need to be able to represent multiple system versions in the TUF repo, with a many-to-many association between system versions and artifacts. In other words, a given artifact (as identified uniquely by the tuple (name, version, kind)) would associated with one or more system versions. A system version would be associated with exactly N artifacts, where N is the number of known artifact kinds (as that number can change over time, I guess we have to say "the number of known artifact kinds at the time of that system version's creation").

I do not know how this should be done — should we have an artifacts.json for each system version? seems like we can kind of do whatever we want — but I imagine @iliana and @sunshowers have thoughts. I think this change could be done in a way that lets Wicket keep working the way it does for now (would just need to add a check that there's one version in the repo, otherwise fail early) while letting us move forward elsewhere.

Background

Wicket assumes there is a single system version in the TUF repo and simply applies all known artifacts. This is a simplifying assumption which allows that code to apply all artifacts that are present and not require the Wicket user to specify a system version when applying updates, instead relying on the fact that repo only contains one system version. The user also cannot specify a wrong version, though of course they could use the wrong TUF repo, which amounts to the same thing.

In our current TUF repo format, artifacts.json has a single system_version field at top level that is taken to apply to all artifacts in the repo.

https://github.com/oxidecomputer/omicron/blob/e4a5dd0029946c0f7f56f4858475c2935e0eb9a9/common/src/update.rs#L16-L22

This is also built into the Nexus code that scans the repo — a single artifacts.json is read in, with a single system version (though that system version is currently ignored — I'm working on this).

https://github.com/oxidecomputer/omicron/blob/e4a5dd0029946c0f7f56f4858475c2935e0eb9a9/nexus/src/updates.rs#L30-L34

As one would expect, this assumption is built into the repo builder tufaceous as well — tufaceous init takes a single system version as a CLI argument and tufaceous assemble takes a TOML manifest that specifies a single system version.

https://github.com/oxidecomputer/omicron/blob/e4a5dd0029946c0f7f56f4858475c2935e0eb9a9/tufaceous/src/dispatch.rs#L179-L183

https://github.com/oxidecomputer/omicron/blob/e4a5dd0029946c0f7f56f4858475c2935e0eb9a9/tufaceous/manifests/fake.toml#L5

sunshowers commented 1 year ago

So there are two sorts of TUF repos:

  1. A "rack update" repo, which represents a single system version and all the components within it.
  2. The master repo, which contains all known versions of all components plus the mapping from system version to components.

Last time iliana and I talked about this we decided that artifacts.json would only represent a rack update repo. We need to come up with a different format for the system repo.

david-crespo commented 1 year ago

Great, will change title accordingly. Right now the Nexus code looks for the same kind of repo that wicket does.

davepacheco commented 9 months ago

Are we still planning to do this? What's the ultimate goal? Not disagreeing, just trying to figure out how it slots into the Automated Update project -- if at all.

iliana commented 9 months ago

This is necessary to serve a single TUF repo over HTTPS that contains many versions. Depending on what Automated Update is defined as, that may be necessary; if Automated Update refers to an operator being able to ask the rack to perform a specific update and have no downtime, this is not required, but if it refers to a rack being able to automatically pull new updates from the internet and install them, it is.

Regardless of the definition I believe we still plan to do this.

davepacheco commented 9 months ago

Thanks -- that's very helpful!