I went to generate the API DAG today and found that the tool is busted:
$ cargo xtask ls-apis servers --output-format=dot
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.50s
Running `target/debug/xtask ls-apis servers --output-format=dot`
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.85s
Running `target/debug/ls-apis servers --output-format=dot`
loading metadata for workspace omicron from current workspace
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-12517f89d3d9f483/8f8fbb7/Cargo.toml
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-de41bdd6c14939ab/056283e/Cargo.toml
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-f3b5bdecdc6486d6/b7b9d56/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-627b239a911c6241/4cdc7d7/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
thread 'main' panicked at dev-tools/ls-apis/src/system_apis.rs:362:69:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Yikes. I changed this panic to produce an error message instead:
$ cargo xtask ls-apis servers --output-format=dot
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
Running `target/debug/xtask ls-apis servers --output-format=dot`
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.88s
Running `target/debug/ls-apis servers --output-format=dot`
loading metadata for workspace omicron from current workspace
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-de41bdd6c14939ab/056283e/Cargo.toml
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-f3b5bdecdc6486d6/b7b9d56/Cargo.toml
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-12517f89d3d9f483/8f8fbb7/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-627b239a911c6241/4cdc7d7/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
Error: missing producer for API with client package "clickhouse-admin-keeper-client"
It took me a while to figure out what was going on. I found:
This client package (clickhouse-admin-keeper-client") should be produced by deployment unit "Clickhouse" (which represents any of these: single-node Clickhouse zone or either the multi-node Clickhouse Server or Clickhouse Keeper zones).
That deployment unit directly contains package "omicron-clickhouse-admin".
"omicron-clickhouse-admin" depends directly on "clickhouse-admin-api", which is where the Dropshot API definition lives for several Clickhouse admin APIs.
What's fishy is that cargo xtask ls-apis servers doesn't show all three of the APIs exposed by the "omicron-clickhouse-admin" package. It only shows one:
$ cargo xtask ls-apis servers
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.50s
Running `target/debug/xtask ls-apis servers`
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.85s
Running `target/debug/ls-apis servers`
loading metadata for workspace omicron from current workspace
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-f3b5bdecdc6486d6/b7b9d56/Cargo.toml
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-de41bdd6c14939ab/056283e/Cargo.toml
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-12517f89d3d9f483/8f8fbb7/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-627b239a911c6241/4cdc7d7/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
omicron-clickhouse-admin (omicron/clickhouse-admin)
exposes: Clickhouse Single-Node Cluster Admin (client = clickhouse-admin-single-client)
...
The API metadata includes:
client "clickhouse-admin-keeper-client" -> server "clickhouse-admin-api"
client "clickhouse-admin-server-client" -> server "clickhouse-admin-api"
client "clickhouse-admin-single-client" -> server "clickhouse-admin-api"
The problem here is that the code has an implicit assumption that a given Rust package will only directly export a single Dropshot API. (It's fine for a component package to expose multiple Dropshot servers. Sled Agent falls into this bucket and it works fine. The difference is that omicron-sled-agent depends on three separate packages for bootstrap agent API, repo depot API, and the sled agent API. Each of these only exposes one Dropshot server. With Clickhouse, a single package exposes all three Dropshot servers.)
It looks pretty straightforward to relax this assumption and I'll put up a PR very shortly that fixes it and also makes sure that this gets tested in the test suite so that similar bugs will be caught by CI.
I went to generate the API DAG today and found that the tool is busted:
Yikes. I changed this panic to produce an error message instead:
It took me a while to figure out what was going on. I found:
What's fishy is that
cargo xtask ls-apis servers
doesn't show all three of the APIs exposed by the "omicron-clickhouse-admin" package. It only shows one:The API metadata includes:
The problem here is that the code has an implicit assumption that a given Rust package will only directly export a single Dropshot API. (It's fine for a component package to expose multiple Dropshot servers. Sled Agent falls into this bucket and it works fine. The difference is that
omicron-sled-agent
depends on three separate packages for bootstrap agent API, repo depot API, and the sled agent API. Each of these only exposes one Dropshot server. With Clickhouse, a single package exposes all three Dropshot servers.)It looks pretty straightforward to relax this assumption and I'll put up a PR very shortly that fixes it and also makes sure that this gets tested in the test suite so that similar bugs will be caught by CI.