Closed davepacheco closed 11 months ago
One problem is that I hadn't run cargo hakari manage-deps
after adding a new crate as part of Nexus:
dap@ivanova omicron-inventory $ cargo hakari manage-deps
info: operations to perform:
* add or update dependency omicron-workspace-hack v0.1.0 (at path workspace-hack) to packages:
- nexus-inventory (at path nexus/inventory)
✔ proceed? · yes
Updating crates.io index
dap@ivanova omicron-inventory $ git diff
diff --git a/Cargo.lock b/Cargo.lock
index fba96d19e..eb56e84cf 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -4466,6 +4466,7 @@ dependencies = [
"gateway-messages",
"gateway-test-utils",
"nexus-types",
+ "omicron-workspace-hack",
"slog",
"strum",
"tokio",
diff --git a/nexus/inventory/Cargo.toml b/nexus/inventory/Cargo.toml
index 7fc360259..3ed7e8b2d 100644
--- a/nexus/inventory/Cargo.toml
+++ b/nexus/inventory/Cargo.toml
@@ -14,6 +14,7 @@ nexus-types.workspace = true
slog.workspace = true
strum.workspace = true
uuid.workspace = true
+omicron-workspace-hack.workspace = true
[dev-dependencies]
expectorate.workspace = true
I don't know if that's enough to cause this.
I don't know if that's enough to cause this.
Doubt that to be honest -- I don't think it's enough to cause this.
I noticed a similar thing today, just in one cargo nextest run
:
$ EXPECTORATE=overwrite cargo nextest run --no-fail-fast -E 'not (package(/nexus/))'
info: experimental features enabled: setup-scripts
Compiling omicron-nexus v0.1.0 (/home/dap/omicron/nexus)
Compiling nexus-db-queries v0.1.0 (/home/dap/omicron/nexus/db-queries)
Compiling omicron-dev v0.1.0 (/home/dap/omicron/dev-tools/omicron-dev)
Compiling omicron-omdb v0.1.0 (/home/dap/omicron/dev-tools/omdb)
Finished test [unoptimized + debuginfo] target(s) in 6m 57s
Starting 377 tests across 100 binaries (2 skipped)
SETUP [ 1/1] crdb-seed: cargo run -p crdb-seed
[ 00:00:00] [ ] 0/379: Compiling serde_json v1.0.108
Compiling headers-core v0.2.0
Compiling atomicwrites v0.4.2
Compiling rcgen v0.11.3
Compiling headers v0.3.9
Compiling schemars v0.8.13
Compiling usdt-impl v0.3.5
Compiling openapiv3 v1.0.3
Compiling postgres-types v0.2.6
Compiling reqwest v0.11.20
Compiling slog-json v2.6.1
Compiling slog-bunyan v2.4.0
Compiling tokio-postgres v0.7.10
Compiling usdt-attr-macro v0.3.5
Compiling usdt-macro v0.3.5
Compiling typify-impl v0.0.13 (https://github.com/oxidecomputer/typify#de16c423)
Compiling ipnetwork v0.20.0
Compiling diesel v2.1.3
Compiling usdt v0.3.5
Compiling progenitor-client v0.3.0 (https://github.com/oxidecomputer/progenitor?branch=main#5c941c0b)
Compiling dropshot v0.9.1-dev (https://github.com/oxidecomputer/dropshot?branch=main#fa728d07)
Compiling typify-macro v0.0.13 (https://github.com/oxidecomputer/typify#de16c423)
Compiling typify v0.0.13 (https://github.com/oxidecomputer/typify#de16c423)
Compiling progenitor-impl v0.3.0 (https://github.com/oxidecomputer/progenitor?branch=main#5c941c0b)
Compiling progenitor-macro v0.3.0 (https://github.com/oxidecomputer/progenitor?branch=main#5c941c0b)
Compiling progenitor v0.3.0 (https://github.com/oxidecomputer/progenitor?branch=main#5c941c0b)
Compiling omicron-workspace-hack v0.1.0 (/home/dap/omicron/workspace-hack)
Compiling api_identity v0.1.0 (/home/dap/omicron/api_identity)
Compiling omicron-common v0.1.0 (/home/dap/omicron/common)
Compiling omicron-test-utils v0.1.0 (/home/dap/omicron/test-utils)
Compiling crdb-seed v0.1.0 (/home/dap/omicron/dev-tools/crdb-seed)
Finished dev [unoptimized + debuginfo] target(s) in 38.39s
Running `target/debug/crdb-seed`
I would have expected hakari
to make it so that once we'd built enough for cargo nextest
to get going, we wouldn't then spend 38s building more dependencies to run crdb-seed
.
OK, I spent some time looking at this and there are two big things that pop out:
We exclude the xtask
package from hakari but include it in workspace.default-members
. This causes serde_json
in particular to be built with a different set of features, which has ripple effects up the build graph. Fix forthcoming (will exclude xtask
from default-members
as well).
One of the promises hakari makes is to be able to unify builds across host (build) and target (normal/dev) deps. This sadly doesn't work for us at all. That's because we build target (normal/dev) deps and host (build) deps with separate sets of options:
debug = "line-tables-only"
, and host deps are built with debug = "false"
. This can be unified relatively easily.panic = "abort"
, while host deps are built with panic = "unwind"
. This can be addressed in one of two ways:
panic = "abort"
. I don't think this is advisable -- it is useful to get backtraces and such from host deps.dev
profile, build target deps with panic = "unwind"
(we can stay with abort
for the release
profile). This I think makes more sense but would cause behavior differences between dev and release builds.Either of the two options above would make a difference for compile times. The number of units (things to build, what shows up in the progress bar in cargo build
) decreases massively. For nexus, with diff:
With the command cat tmp/nexus-unit-graph.json | jq -r '.units | length'
and running cargo test --no-run -p omicron-nexus
, I got:
Before: 1230 units (2m 25s fresh build) After: 838 (2m 15s fresh build)
This is a pretty decent difference.
As a third option, we could just accept that we're going to have different builds on the host and target platforms forever, and tell hakari to not try and unify dependencies across the platforms.
I used the --unit-graph
option:
export RUSTC_BOOTSTRAP=1
cargo test --no-run -Z unstable-options --unit-graph | jq > tmp/cargo-test-unit-graph.json
cargo test --no-run -p omicron-nexus -Z unstable-options --unit-graph | jq > tmp/nexus-unit-graph.json
Then, I investigated the two outputs, starting from serde_json
since that showed up in a few example outputs. The differences were clearly apparent, and from there I started hammering down the details and iterating.
OK, so the crdb-seed
issue is also interesting: what's happening there is that tests are always built with panic = unwind
whether or not profile.dev.panic
is abort
(which makes sense for tests), but cargo run
uses profile.dev.panic
. This means that it would be best for dependency reuse that we use unwind
for dev builds.
2. Build host deps with
panic = "abort"
. I don't think this is advisable -- it is useful to get backtraces and such from host deps.
Is the assumption here that the reason we don't get the backtrace is because the generation of a core dump which would have such a backtrace is disabled?
Is the assumption here that the reason we don't get the backtrace is because the generation of a core dump which would have such a backtrace is disabled?
Ah I was thinking in terms of backtraces printed by Rust itself, e.g. by running RUST_BACKTRACE=1
which is pretty common while writing tests. We'd separately get a backtrace from core dumps if they're enabled.
All of the cases above should now be handled with #4535. Thanks again for all your patience.
I noticed all this working on branch "dap/nexus-inventory-2". Commit 376a37bb21d8bab98a2c4cd8747fc3a5ee3140d3 is sync'd up with "main" from this morning. From there, I ran a full test suite run that failed one test:
I then tried to fix this test, which changed these files:
nexus/db-model/src/schema.rs
,schema/crdb/dbinit.sql
,schema/crdb/9.0.0/
(a whole tree).Then I tried to rerun just the failing test. But it looks like a whole lot of stuff got rebuilt, including a bunch of deps from outside the workspace:
This was unexpected -- I thought hakari was supposed to prevent that. I wondered if mine was out of date. But I think not?