oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

test-flake: integration_tests::updates::test_update_uninitialized #4949

Open rcgoodfellow opened 9 months ago

rcgoodfellow commented 9 months ago

This test passed in a PR, then failed on main once the PR landed.

See

iliana commented 9 months ago
Error: repository fetch should have failed with 500 error

Caused by:
    0: making request to server
    1: connection closed before message completed

Yet the server seemed to do the right thing?

rcgoodfellow commented 9 months ago

It was the Linux job that had this failure. The helios job passed on the first go. The test passed when re-running the Linux CI job.

smklein commented 9 months ago

Saw this same flake here: https://github.com/oxidecomputer/omicron/pull/4936/checks?check_run_id=21134488445

sunshowers commented 9 months ago

I wrote this test, will look tomorrow.

smklein commented 8 months ago

One more data point: https://github.com/oxidecomputer/omicron/issues/5246 , still flaking

smklein commented 8 months ago

This was on buildomat / build-and-test (helios) , which is not a Linux job, fwiw

sunshowers commented 8 months ago

Started looking at this a couple weeks ago but got preempted :(

Wondering if we should just throw a retry on the test for now.

elaine-oxide commented 2 months ago

I encountered the same issue (in buildomat / build-and-test (ubuntu-22.04), FAIL for omicron-nexus::test_all integration_tests::updates::test_update_uninitialized), where the test passed in a PR, then failed on main once the PR landed.

https://github.com/oxidecomputer/omicron/commit/3d3f6d735e44c8f84b7b23d7e6ca3b213ac15355

https://github.com/oxidecomputer/omicron/runs/29801528621

elaine-oxide commented 2 months ago

The previously failing test passed on re-run. https://github.com/oxidecomputer/omicron/runs/29808706363

andrewjstone commented 1 month ago

Looks like we hit this again: https://buildomat.eng.oxide.computer/wg/0/details/01J83YQ7AHHW6QKHJDPJ1VNTQZ/TO4wpZD6Wkn0cqGZqM1XfG8KqjefjStmDUPQq3Xtnnd1ZgiZ/01J83YQVGZ2RM05A2M6PA2SF1J

andrewjstone commented 1 month ago

On the latest run, there are a few odd failures. I'm not actually sure this is the test's fault in particular. It may be a symptom of some other issue. I'm seeing issues with DNS and a network unreachable error. I'm not sure if these are things we would ever expect to see.

https://buildomat.eng.oxide.computer/wg/0/artefact/01J83YQ7AHHW6QKHJDPJ1VNTQZ/TO4wpZD6Wkn0cqGZqM1XfG8KqjefjStmDUPQq3Xtnnd1ZgiZ/01J83YQVGZ2RM05A2M6PA2SF1J/01J841N63E4WRZMTEKP3AK76NY/test_all-d3f0fb0580db6861-test_update_end_to_end.176436.1.log?format=x-bunyan#L704 https://buildomat.eng.oxide.computer/wg/0/artefact/01J83YQ7AHHW6QKHJDPJ1VNTQZ/TO4wpZD6Wkn0cqGZqM1XfG8KqjefjStmDUPQq3Xtnnd1ZgiZ/01J83YQVGZ2RM05A2M6PA2SF1J/01J841N63E4WRZMTEKP3AK76NY/test_all-d3f0fb0580db6861-test_update_end_to_end.176436.1.log?format=x-bunyan#L3058

CC @davepacheco @rcgoodfellow

FelixMcFelix commented 3 weeks ago

This one's come up again today for me: https://buildomat.eng.oxide.computer/wg/0/details/01JADDKWGJPE0W6GWXXKVACPCS/tntBJzr7RPDoVr0k1n2H2XbfwB7E1I31lHt7USHEwUNUteb7/01JADDM60TPPKHJFF6GYQ0E9QW.

andrewjstone commented 3 weeks ago

This one's come up again today for me: https://buildomat.eng.oxide.computer/wg/0/details/01JADDKWGJPE0W6GWXXKVACPCS/tntBJzr7RPDoVr0k1n2H2XbfwB7E1I31lHt7USHEwUNUteb7/01JADDM60TPPKHJFF6GYQ0E9QW.

I think you actually hit this error: https://github.com/oxidecomputer/omicron/issues/6771 but in a test.

iliana commented 1 week ago

This test is part of a set of tests that we're almost certainly going to delete in the short-term; everything that module tests is going to be replaced with the TUF Repo Depot and the Reconfigurator. It might be nice to figure out the root cause but if it's causing pain today I'm happy to delete the update system we've never used! No wait this is wrong, I forgot we had done a bunch of work in Nexus to flesh out some of the original code I wrote.