Migrate unikernel builds to cluster

talex5 commented 3 years ago

I've moved all the Docker services from the old deployer on toxis to the new one on ci3.

The only thing the old service is still doing is deploying the unikernels. This code is now on the old-master branch.

The old service builds the unikernels locally with Docker and then rsyncs to the host. However, the new deployer host isn't suitable for building things. Instead, it should use the cluster. We need to decide how to get the resulting binaries from the workers to the host.

Some options:

Push to Docker hub and have the unikernel host pull them (requires Docker on the host though).
Have the worker transfer directly to the host (will probably want some kind of short-lived session key for the transfer).
Transfer from the worker to the deployer and rsync from there as before. Avoids having to make changes on the unikernel host.

@hannesm do you have a preference here? I think you mentioned that you were planning some kind of unikernel registery which we could push to...

hannesm commented 3 years ago

Indeed there is builder using orb to conduct reproducible builds (output is besides the unikernel image also the exact opam packages, and host system information and packages). All this is uploaded to our web service which stores the builds and metadata in a database (thanks to @reynir), see https://builds.robur.coop for the web interface.

The web and database code is not yet in a public git repository (@reynir should we make that repository public / push it to github?).

Now, the deployment model of unikernels differs:

(a) mirage-www contains code and data, on every push to the main branch it gets rebuild and deployed (build-on-push, deploy-automatically-on-push, data-integrated)
(b) for DNS service (ns.mirageos.org), my blog (hannes.robur.coop), robur website (robur.coop), ... we do daily builds (with builder) of the code (and retain information to reproduce the very same binary), while the data is stored in git repositories (build-daily, deploy-manually, data-external) -- i.e. the canopy mechanism (a push hook that leads to a pull of the git repository)

The reasoning for (a) is to get code (and data) into production ASAP without any manual intervention. The downside is that if a dependency of (a) is updated, deployment thereof will only be done by a commit to mirage-www (AFAIU). The downside of (b) is laggy deployments and builds: needs human intervention to deploy, and builds are triggered on a daily basis only.

While building and storing the resulting artifacts can be done at any pace (daily / on commit) -- depending on the trigger -- when and how the deployment is done is a crucial question -- for (a) some service needs access (private keys) to the host for deployment, for (b) some human needs to act.

Proposed workflow for a mirage-www deployment

Each git commit to the main branch (or deploy branch, ..) results in a build. If this build is successful, a reproducible image is uploaded to the unikernel repository. The mirage host at packet.net watches the unikernel repository and deploys if a new unikernel gets available.

Systems involved:

ocluster / ocurrent which trigger the build <- secret shared with unikernel repository to upload the artifacts
unikernel repository: lists the unikernel job, provides possibility to upload artifacts (authenticated), and provides all unikernels via HTTP <- secret shared with ocluster / ocurrent
mirage host at packet.net which observes the unikernel repository for a given unikernel job (mirage-www) and redeploys whenever a new artifact is available (polls every 5 minutes on /job/mirage-www/latest/artifact.hvt) <- doesn't need any secrets

Does that sound reasonable @talex5 @reynir @avsm?

This needs:

ocluster to upload unikernel image to unikernel repository (some way to pass a http url including secret to ocluster)
unikernel-repository an upload url that accepts a raw unikernel image and job name (at the moment, it only accepts ASN.1 DER encoding)
unikernel-repository user management (very basic "user X may only upload to job Y")
unikernel-repository needs a persistent url /job//latest/ (which redirects to the latest build)
shell script which downloads artifact, compares hash, and deploys if a different hash is found (to be deployed on mirage packet.net machine)

(Of course, we can have another ~~builder-web~~ unikernel-repository instance running on mirage packet.net infrastructure -- but tbh this is still under heavy development where having a single instance reduces the maintenance burden).

Version NG

ocluster gathers more output (console output, build environment, packages, unikernel image -- eventually using orb) and upload to unikernel-repository (which structured data format is preferable here?) --> a third upload url
ocurrent has a "main secret", and hands out time-limited tokens for the jobs to upload artifacts
albatross to ship a websocket watcher, unikernel-repository to provide a websocket for changes

Conclusion

Let's agree on a deployment workflow, and focus on the most basic version to improve the setup soon. We can work out a more enhanced version afterwards.

We can discuss this either here or in a video call (not this week, but the week after (May 25th -- 28th) I'm back from vacation).

talex5 commented 3 years ago

if a dependency of (a) is updated, deployment thereof will only be done by a commit to mirage-www (AFAIU)

The mirage-www build is controlled by the Dockerfile in the mirage-www repository, which fixes the opam-repository commit, so it will only update when that is changed manually. We could also monitor opam-repository and use the latest version (as ocaml-ci does). Once Mirage 4 is out (using dune), we can use ocaml-ci to check that the latest version builds, while still pinning a known-good version for deployment.

polls every 5 minutes

It seems a shame to replace our current system of deploying in a few seconds with feedback, with one that takes 5 minutes and gives no feedback. Perhaps the deployer could still ssh to the host and run the update script, which still pulling from the repository? That's how the original version worked (but using Docker Hub as the registry back then).

some way to pass a http url including secret to ocluster

There's a special field for passing secrets to build jobs, so this should be fine: https://github.com/ocurrent/ocluster/blob/d2e546cf0f10844df159a04b66895eae0e7f4858/api/schema.capnp#L69

ocurrent / ocurrent-deployer