Closed tuminoid closed 3 months ago
/triage accepted
/cc @dtantsur @elfosardo @MahnoorAsghar @mboukhalfa @Rozzii @kashifest FYI
Notable difference in BMO logs is
"level":"info","ts":1718358988.3083067,"logger":"provisioner.ironic","msg":"error caught while checking endpoint, will retry","host":"metal3~node-0","endpoint":"https://172.22.0.2:6385/v1/","error":"Expected HTTP response code [200 300] when accessing [GET https://172.22.0.2:6385/v1/], but got 503 instead: <!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>"}
{"level":"info","ts":1718358988.3096807,"logger":"controllers.BareMetalHost","msg":"provisioner is not ready","baremetalhost":{"name":"node-0","namespace":"metal3"},"RequeueAfter:":30}
{"level":"info","ts":1718358988.3113363,"logger":"provisioner.ironic","msg":"error caught while checking endpoint, will retry","host":"metal3~node-1","endpoint":"https://172.22.0.2:6385/v1/","error":"Expected HTTP response code [200 300] when accessing [GET https://172.22.0.2:6385/v1/], but got 503 instead: <!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>"}
that occurs on main only, but not with patch reverted. Code path looks like its going to retry, but never recovers, only spams provisioner is not ready
, while the reverted tests shows that after a while of provisioner is not ready
it goes to next provisioner state.
I hope this will fix it or at least move us closer : https://github.com/metal3-io/baremetal-operator/pull/1786
What steps did you take and what happened: https://github.com/metal3-io/baremetal-operator/pull/1685 was merged, and since then all Metal3 Centos based e2e tests on main branch have failed. If the PR is reverted, they work.
What did you expect to happen: Centos e2e succeeds.
Anything else you would like to add: Ubuntu variants pass (given that #1780 is merged to fix one issue), so this is isolated to Centos.
Environment: Dev-env / CI, e2e integration, e2e feature, e2e ephemeral, bml e2e periodics all fail. All PR jobs with centos-e2e-integration-main fail
See https://jenkins.nordix.org/view/Metal3%20Periodic/job/metal3-periodic-centos-e2e-integration-test-main/87/ or any other periodic centos main job.
/kind bug