oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
242 stars 36 forks source link

test failed in CI: `omicron-cockroach-admin cockroach_cli::tests::test_node_decommission_compatibility` #6506

Open hawkw opened 1 week ago

hawkw commented 1 week ago

This test failed on a CI run on #6499:

Unfortunately, I rebased the PR and have since lost the GitHub check URL for the failure...

Log showing the specific test failure:

https://buildomat.eng.oxide.computer/wg/0/details/01J6Q8748MZKW18KSEF4540CQZ/iy5XpuAznwyLAetcf2Mfjh1o9XO11hzknMn8Mr0yhl9xfXK3/01J6Q87HTCRRS09RECVKRHBBSR#S5100

Excerpt from the log showing the failure:

failures:
    cockroach_cli::tests::test_node_decommission_compatibility

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out; finished in 3.47s

--- STDERR:              omicron-cockroach-admin cockroach_cli::tests::test_node_decommission_compatibility ---
log file: /var/tmp/omicron_tmp/omicron_cockroach_admin-e0b1ce4a08f3ea08-test_node_decommission_compatibility.38877.0.log
note: configured to log to "/var/tmp/omicron_tmp/omicron_cockroach_admin-e0b1ce4a08f3ea08-test_node_decommission_compatibility.38877.0.log"
thread 'cockroach_cli::tests::test_node_decommission_compatibility' panicked at cockroach-admin/src/cockroach_cli.rs:236:36:
header line
stack backtrace:
   0: rust_begin_unwind
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:72:14
   2: core::panicking::panic_display
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:262:5
   3: core::option::expect_failed
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/option.rs:1995:5
   4: core::option::Option<T>::expect
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/option.rs:898:21
   5: {async_block#0}
             at ./src/cockroach_cli.rs:236:23
   6: poll<&mut dyn core::future::future::Future<Output=()>>
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/future/future.rs:123:9
   7: poll<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/future/future.rs:123:9
   8: {closure#0}<core::pin::Pin<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:673:57
   9: with_budget<core::task::poll::Poll<()>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/coop.rs:107:5
  10: budget<core::task::poll::Poll<()>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/coop.rs:73:5
  11: {closure#0}<core::pin::Pin<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:673:25
  12: tokio::runtime::scheduler::current_thread::Context::enter
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:412:19
  13: {closure#0}<core::pin::Pin<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:672:36
  14: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:751:68
  15: tokio::runtime::context::scoped::Scoped<T>::set
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/context/scoped.rs:40:9
  16: tokio::runtime::context::set_scheduler::{{closure}}
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/context.rs:180:26
  17: try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<()>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>>, core::option::Option<()>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<()>)>
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/thread/local.rs:283:12
  18: std::thread::local::LocalKey<T>::with
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/thread/local.rs:260:9
  19: tokio::runtime::context::set_scheduler
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/context.rs:180:9
  20: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:751:27
  21: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:660:19
  22: {closure#0}<core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:180:28
  23: tokio::runtime::context::runtime::enter_runtime
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/context/runtime.rs:65:16
  24: block_on<core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/scheduler/current_thread/mod.rs:168:9
  25: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/runtime.rs:361:47
  26: block_on<core::pin::Pin<&mut dyn core::future::future::Future<Output=()>>>
             at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.3/src/runtime/runtime.rs:335:13
  27: test_node_decommission_compatibility
             at ./src/cockroach_cli.rs:269:9
  28: omicron_cockroach_admin::cockroach_cli::tests::test_node_decommission_compatibility::{{closure}}
             at ./src/cockroach_cli.rs:208:52
  29: core::ops::function::FnOnce::call_once
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/ops/function.rs:250:5
  30: core::ops::function::FnOnce::call_once
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
WARN: dropped CockroachInstance without cleaning it up first (there may still be a child process running and a temporary directory leaked)
WARN: temporary directory leaked: "/var/tmp/omicron_tmp/.tmpQpS5s8"
If you would like to access the database for debugging, run the following:

# Run the database
cargo xtask db-dev run --no-populate --store-dir "/var/tmp/omicron_tmp/.tmpQpS5s8/data"
# Access the database. Note the port may change if you run multiple databases.
cockroach sql --host=localhost:32221 --insecure

It looks like we are waiting for output from a CRDB CLI process that never seems to appear? The test has passed successfully on subsequent runs.

jgallagher commented 1 week ago

Weird! I think a panic there means stdout was empty, so (presumably?) the command failed. I'll add some logging that should help the next time it flakes, assuming that guess is correct.