ocurrent / ocurrent-deployer

A pipeline that deploys unikernels and other services
MIT License
22 stars 19 forks source link

Migrate unikernel builds to cluster #58

Open talex5 opened 3 years ago

talex5 commented 3 years ago

I've moved all the Docker services from the old deployer on toxis to the new one on ci3.

The only thing the old service is still doing is deploying the unikernels. This code is now on the old-master branch.

The old service builds the unikernels locally with Docker and then rsyncs to the host. However, the new deployer host isn't suitable for building things. Instead, it should use the cluster. We need to decide how to get the resulting binaries from the workers to the host.

Some options:

  1. Push to Docker hub and have the unikernel host pull them (requires Docker on the host though).
  2. Have the worker transfer directly to the host (will probably want some kind of short-lived session key for the transfer).
  3. Transfer from the worker to the deployer and rsync from there as before. Avoids having to make changes on the unikernel host.

@hannesm do you have a preference here? I think you mentioned that you were planning some kind of unikernel registery which we could push to...

hannesm commented 3 years ago

Indeed there is builder using orb to conduct reproducible builds (output is besides the unikernel image also the exact opam packages, and host system information and packages). All this is uploaded to our web service which stores the builds and metadata in a database (thanks to @reynir), see https://builds.robur.coop for the web interface.

The web and database code is not yet in a public git repository (@reynir should we make that repository public / push it to github?).

Now, the deployment model of unikernels differs:

The reasoning for (a) is to get code (and data) into production ASAP without any manual intervention. The downside is that if a dependency of (a) is updated, deployment thereof will only be done by a commit to mirage-www (AFAIU). The downside of (b) is laggy deployments and builds: needs human intervention to deploy, and builds are triggered on a daily basis only.

While building and storing the resulting artifacts can be done at any pace (daily / on commit) -- depending on the trigger -- when and how the deployment is done is a crucial question -- for (a) some service needs access (private keys) to the host for deployment, for (b) some human needs to act.

Proposed workflow for a mirage-www deployment

Each git commit to the main branch (or deploy branch, ..) results in a build. If this build is successful, a reproducible image is uploaded to the unikernel repository. The mirage host at packet.net watches the unikernel repository and deploys if a new unikernel gets available.

Systems involved:

Does that sound reasonable @talex5 @reynir @avsm?

This needs:

(Of course, we can have another builder-web unikernel-repository instance running on mirage packet.net infrastructure -- but tbh this is still under heavy development where having a single instance reduces the maintenance burden).

Version NG

Conclusion

Let's agree on a deployment workflow, and focus on the most basic version to improve the setup soon. We can work out a more enhanced version afterwards.

We can discuss this either here or in a video call (not this week, but the week after (May 25th -- 28th) I'm back from vacation).

talex5 commented 3 years ago

if a dependency of (a) is updated, deployment thereof will only be done by a commit to mirage-www (AFAIU)

The mirage-www build is controlled by the Dockerfile in the mirage-www repository, which fixes the opam-repository commit, so it will only update when that is changed manually. We could also monitor opam-repository and use the latest version (as ocaml-ci does). Once Mirage 4 is out (using dune), we can use ocaml-ci to check that the latest version builds, while still pinning a known-good version for deployment.

polls every 5 minutes

It seems a shame to replace our current system of deploying in a few seconds with feedback, with one that takes 5 minutes and gives no feedback. Perhaps the deployer could still ssh to the host and run the update script, which still pulling from the repository? That's how the original version worked (but using Docker Hub as the registry back then).

some way to pass a http url including secret to ocluster

There's a special field for passing secrets to build jobs, so this should be fine: https://github.com/ocurrent/ocluster/blob/d2e546cf0f10844df159a04b66895eae0e7f4858/api/schema.capnp#L69