oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

`ls-apis` broken due to the new clickhouse admin APIs #7121

Closed davepacheco closed 1 day ago

davepacheco commented 1 day ago

I went to generate the API DAG today and found that the tool is busted:

$ cargo xtask ls-apis servers --output-format=dot
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.50s
     Running `target/debug/xtask ls-apis servers --output-format=dot`
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.85s
     Running `target/debug/ls-apis servers --output-format=dot`
loading metadata for workspace omicron from current workspace
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-12517f89d3d9f483/8f8fbb7/Cargo.toml
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-de41bdd6c14939ab/056283e/Cargo.toml
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-f3b5bdecdc6486d6/b7b9d56/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-627b239a911c6241/4cdc7d7/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
thread 'main' panicked at dev-tools/ls-apis/src/system_apis.rs:362:69:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Yikes. I changed this panic to produce an error message instead:

$ cargo xtask ls-apis servers --output-format=dot
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
     Running `target/debug/xtask ls-apis servers --output-format=dot`
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.88s
     Running `target/debug/ls-apis servers --output-format=dot`
loading metadata for workspace omicron from current workspace
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-de41bdd6c14939ab/056283e/Cargo.toml
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-f3b5bdecdc6486d6/b7b9d56/Cargo.toml
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-12517f89d3d9f483/8f8fbb7/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-627b239a911c6241/4cdc7d7/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
Error: missing producer for API with client package "clickhouse-admin-keeper-client"

It took me a while to figure out what was going on. I found:

What's fishy is that cargo xtask ls-apis servers doesn't show all three of the APIs exposed by the "omicron-clickhouse-admin" package. It only shows one:

$ cargo xtask ls-apis servers
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.50s
     Running `target/debug/xtask ls-apis servers`
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.85s
     Running `target/debug/ls-apis servers`
loading metadata for workspace omicron from current workspace
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-f3b5bdecdc6486d6/b7b9d56/Cargo.toml
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-de41bdd6c14939ab/056283e/Cargo.toml
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-12517f89d3d9f483/8f8fbb7/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-627b239a911c6241/4cdc7d7/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
omicron-clickhouse-admin (omicron/clickhouse-admin)
    exposes: Clickhouse Single-Node Cluster Admin (client = clickhouse-admin-single-client)

...

The API metadata includes:

The problem here is that the code has an implicit assumption that a given Rust package will only directly export a single Dropshot API. (It's fine for a component package to expose multiple Dropshot servers. Sled Agent falls into this bucket and it works fine. The difference is that omicron-sled-agent depends on three separate packages for bootstrap agent API, repo depot API, and the sled agent API. Each of these only exposes one Dropshot server. With Clickhouse, a single package exposes all three Dropshot servers.)

It looks pretty straightforward to relax this assumption and I'll put up a PR very shortly that fixes it and also makes sure that this gets tested in the test suite so that similar bugs will be caught by CI.