Open rcgoodfellow opened 1 week ago
Bummer -- and thanks for filing this.
From the output, it looks to me like the test ran the command omdb db instances
and that panicked with:
thread 'tokio-runtime-worker' panicked at /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-bb8-diesel-0.2.1/src/async_traits.rs:97:14:
called `Result::unwrap()` on an `Err` value: JoinError::Cancelled(Id(36))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
That's not very much to go on. We don't have more because this was a subprocess -- the test ultimately failed only because the output didn't match what it expected. I haven't totally given up yet but I've put up #6516 so that if we hit this again we'll get more information about a panic from the subprocess.
The panic message is coming from here: https://github.com/oxidecomputer/async-bb8-diesel/blob/1850c9d9a9311ff6a60cadee9023e7693eda3304/src/async_traits.rs#L97
But I think that's just propagating a panic that happened in the middle of just about anything that async-bb8-diesel was doing. There are a few unwraps in in the omdb db instances
command itself:
https://github.com/oxidecomputer/omicron/blob/a77c31bf0238c08fc221609e9134c1c519d584f7/dev-tools/omdb/src/bin/omdb/db.rs#L2837-L2920
But if we panicked in those, I don't think it would show up in async-bb8-diesel. I'm trying to figure out what would show up there. We're not using transaction_async
in this code so I don't see how we could have entered async-bb8-diesel and then called back out to this code. An example might be if the synchronous load
panicked, but that's not our code so that would be surprising.
I'm also going to file an async-bb8-diesel bug because it seems like it could propagate more about the panic error in this situation.
Actually, I'm not sure this is an async-bb8-diesel bug. Looking more closely at the JoinError
, it's saying that the underlying task was cancelled, not that it panicked. How did that happen? Looking at the docs:
When you shut down the executor, it will wait indefinitely for all blocking operations to finish. You can use shutdown_timeout to stop waiting for them after a certain timeout. Be aware that this will still not cancel the tasks — they are simply allowed to keep running after the method returns. It is possible for a blocking task to be cancelled if it has not yet started running, but this is not guaranteed.
One way I could imagine this happening is if the program started an async database operation (like load_async
) but then panicked before the corresponding tokio task was started. That might trigger teardown of the executor and we might see this second panic. But then shouldn't we see some information about that other panic?
This test failed on a CI run on pull request 6475:
https://github.com/oxidecomputer/omicron/pull/6475/checks?check_run_id=29546110600
https://buildomat.eng.oxide.computer/wg/0/details/01J6RJ0W9K2R1TX0DVBZ0RS47V/qhyGpI4O40yzHVoFHWrAhRBFaESiU4fFqaOicq5NLEyLHAz2/01J6RJ164N5KYG7G3SJ5PFFX0H
Log showing the specific test failure:
https://buildomat.eng.oxide.computer/wg/0/details/01J6RJ0W9K2R1TX0DVBZ0RS47V/qhyGpI4O40yzHVoFHWrAhRBFaESiU4fFqaOicq5NLEyLHAz2/01J6RJ164N5KYG7G3SJ5PFFX0H#S5276
Excerpt from the log showing the failure: