Open bnaecker opened 6 days ago
I think I understand this. This test starts some mock sled-agents then inserts records for those mock sled-agents into CRDB. It then proceeds to activate the Nexus RPW it's trying to test a few ways and confirms that task talks to the sled-agents as expected.
However, we're running under a #[nexus_test]
, so other Nexus RPWs entirely unrelated to blueprint execution are running. Presumably in this flake, the inventory collection RPW saw the CRDB records for our mock sled agents and tried to collect their inventories, causing the mock http server to complain about an unexpected request.
I'm not sure off the top of my head what the best fix for this is, though. Simulated sled agent instead of a mock? Turn off the other RPWs (if that's possible)? Tell the mock sled-agent to ignore unexpected requests?
Tell the mock sled-agent to ignore unexpected requests?
I like this approach. It looks like we have complete control over the mock servers, so we maybe we can keep track only of those counts we care about. The assertion can track that one count, rather than all requests.
Still happening after the fix in #6590 (https://github.com/oxidecomputer/omicron/pull/6593#issuecomment-2357427950). #6594 should fix this, but I'm worried it may just make the flake less likely. I'd like to leave this open for a while even after #6594 lands in case we see it again (at which point we probably need to rework that test to either use a different http mocking library or do its checks some other way).
This test failed on a CI run on pull request #6551:
https://github.com/oxidecomputer/omicron/pull/6551/checks?check_run_id=30071404933
Log showing the specific test failure:
https://buildomat.eng.oxide.computer/wg/0/details/01J7KRQA0QPH39CY96CQJF6Z23/1FrIoqw9yG0jDA7yi1gTIwhUFrVoqNJvyLQAGB0ClHEq3pm0/01J7KRR5P728MRF0TSY1N93XM3#S4699
Excerpt from the log showing the failure: