Open rcgoodfellow opened 9 months ago
Error: repository fetch should have failed with 500 error
Caused by:
0: making request to server
1: connection closed before message completed
It was the Linux job that had this failure. The helios job passed on the first go. The test passed when re-running the Linux CI job.
Saw this same flake here: https://github.com/oxidecomputer/omicron/pull/4936/checks?check_run_id=21134488445
I wrote this test, will look tomorrow.
One more data point: https://github.com/oxidecomputer/omicron/issues/5246 , still flaking
This was on buildomat / build-and-test (helios)
, which is not a Linux job, fwiw
Started looking at this a couple weeks ago but got preempted :(
Wondering if we should just throw a retry on the test for now.
I encountered the same issue (in buildomat / build-and-test (ubuntu-22.04)
, FAIL for omicron-nexus::test_all integration_tests::updates::test_update_uninitialized
), where the test passed in a PR, then failed on main once the PR landed.
https://github.com/oxidecomputer/omicron/commit/3d3f6d735e44c8f84b7b23d7e6ca3b213ac15355
https://github.com/oxidecomputer/omicron/runs/29801528621
The previously failing test passed on re-run. https://github.com/oxidecomputer/omicron/runs/29808706363
On the latest run, there are a few odd failures. I'm not actually sure this is the test's fault in particular. It may be a symptom of some other issue. I'm seeing issues with DNS and a network unreachable error. I'm not sure if these are things we would ever expect to see.
https://buildomat.eng.oxide.computer/wg/0/artefact/01J83YQ7AHHW6QKHJDPJ1VNTQZ/TO4wpZD6Wkn0cqGZqM1XfG8KqjefjStmDUPQq3Xtnnd1ZgiZ/01J83YQVGZ2RM05A2M6PA2SF1J/01J841N63E4WRZMTEKP3AK76NY/test_all-d3f0fb0580db6861-test_update_end_to_end.176436.1.log?format=x-bunyan#L704 https://buildomat.eng.oxide.computer/wg/0/artefact/01J83YQ7AHHW6QKHJDPJ1VNTQZ/TO4wpZD6Wkn0cqGZqM1XfG8KqjefjStmDUPQq3Xtnnd1ZgiZ/01J83YQVGZ2RM05A2M6PA2SF1J/01J841N63E4WRZMTEKP3AK76NY/test_all-d3f0fb0580db6861-test_update_end_to_end.176436.1.log?format=x-bunyan#L3058
CC @davepacheco @rcgoodfellow
This one's come up again today for me: https://buildomat.eng.oxide.computer/wg/0/details/01JADDKWGJPE0W6GWXXKVACPCS/tntBJzr7RPDoVr0k1n2H2XbfwB7E1I31lHt7USHEwUNUteb7/01JADDM60TPPKHJFF6GYQ0E9QW.
This one's come up again today for me: https://buildomat.eng.oxide.computer/wg/0/details/01JADDKWGJPE0W6GWXXKVACPCS/tntBJzr7RPDoVr0k1n2H2XbfwB7E1I31lHt7USHEwUNUteb7/01JADDM60TPPKHJFF6GYQ0E9QW.
I think you actually hit this error: https://github.com/oxidecomputer/omicron/issues/6771 but in a test.
This test is part of a set of tests that we're almost certainly going to delete in the short-term; everything that module tests is going to be replaced with the TUF Repo Depot and the Reconfigurator. It might be nice to figure out the root cause but if it's causing pain today I'm happy to delete the update system we've never used! No wait this is wrong, I forgot we had done a bunch of work in Nexus to flesh out some of the original code I wrote.
This test passed in a PR, then failed on main once the PR landed.
See