metal3-io / baremetal-operator

Bare metal host provisioning integration for Kubernetes
Apache License 2.0
554 stars 241 forks source link

BMO should print why its sitting in reconcile loop more than once #1787

Open tuminoid opened 1 month ago

tuminoid commented 1 month ago

When debugging https://github.com/metal3-io/baremetal-operator/issues/1785 we get https://github.com/metal3-io/baremetal-operator/issues/1785#issuecomment-2167804321 once per node, and after that BMO sits in provisioner is not ready loop forever, without printing the root cause again. This makes issues hard to debug, as even in normal boot up of BMO / Ironic, provisioner is not ready is printed for some time.

We may even cases where there would be multiple problems and BMO would not print the other one at all, since it sits in reconcile loop.

It would hence be great, if BMO could print the error message more than once.

/kind bug

BMO version: main

dtantsur commented 1 month ago

Yeah, while some controller do log the cause, the BareMetalHost and BMCEventSubscription ones do not. I agree it's a problem.

/triage accepted /help

metal3-io-bot commented 1 month ago

@dtantsur: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/1787): >Yeah, while some controller do log the cause, the BareMetalHost and BMCEventSubscription ones do not. I agree it's a problem. > >/triage accepted >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
babugeet commented 1 month ago

@dtantsur, @tuminoid I would like to take it up, if you could give me some pointers to look into. I saw 5 reconcile functions, does all of these requires this modification

dtantsur commented 1 month ago

@babugeet grep the source code for "provisioner is not ready". You'll see several instances of this phrase in different controllers. Some include the error message that caused it, some do not. Those that don't require fixing.

babugeet commented 1 month ago

/assign