Open davepacheco opened 1 year ago
Do we want to consider running CRDB within a Linux VM?
No I don't think so. From my perspective the original reasons still apply:
NB: wrt. the dependencies, Node and Go are already maintained by other people (e.g., pkgsrc and OmniOS both ship up to date versions, and we pull updates from OmniOS into Helios) it's "just" the actual Cockroach build that needs the bigger work.
I don't think so. I don't think much about the original calculus has changed.
There's a bunch of stuff that we get "for free" by running this as a Helios zone that we'd have to re-implement for guest VMs:
To be clear, we need all of this for whatever environment we deploy our software in. It's just that we're already doing all of this for Helios zones. It'd be quite a lot of work to build parallel infrastructure for Linux VMs.
What have been / will be the costs to running on Helios?
Basically, I think switching would be a ton of up-front work and a bunch of ongoing work that the whole team would keep having to deal with (i.e., everyone will have to deal with the VM and the infrastructure around it); whereas I think (but don't know!) that doing the Bazel work would be much less work up front, less ongoing, and only the people keeping it up would have to deal with it.
An illumos port of Bazel recently landed in OmniOS. I've opened a PR to pull this into Helios at https://github.com/oxidecomputer/helios-omnios-extra/pull/3 More details over there.
Moving to the 8 milestone aspirationally; it's likely if we want to eventually get to Cockroach v23.2 we'll want to go one release at a time, and it'd be best to do that sooner rather than later.
Fortunately it seems to build now, some notes at https://github.com/oxidecomputer/garbage-compactor/pull/7#issuecomment-2038998312. I've uploaded my build to /staff/iliana/cockroach-v22.2.19.illumos.tar.gz
.
However there are a significant number of test failures after introducing v22.2, caused by trying to set the first blueprint as the target blueprint. The blueprints InsertTargetQuery
CTE is returning the no-such-blueprint
sentinel for reasons I can't figure out quite yet. (cargo nextest run -p nexus-db-queries -- db::datastore::deployment::tests::test_representative_blueprint
is the most trivial way I've found to reproduce this.)
thread 'db::datastore::deployment::tests::test_representative_blueprint' panicked at nexus/db-queries/src/db/datastore/deployment.rs:1457:14:
called `Result::unwrap()` on an `Err` value: ObjectNotFound { type_name: Blueprint, lookup_type: ById(5234919e-1005-441a-9575-614305173dbe) }
Cockroach have just released v22.2.0 with Go 1.19.1. It'd be good to update to this as soon as convenient.