openshift-metal3 / dev-scripts

Scripts to automate development/test setup for openshift integration with https://github.com/metal3-io/
Apache License 2.0
93 stars 185 forks source link

Master machines are not assigned to nodes #917

Closed jtomasek closed 4 years ago

jtomasek commented 4 years ago

Describe the bug After the cluster is deployed, the master machines are not assigned to nodes. Master Nodes don't have machine.openhift.io/machine annotation and the machine does not have status.addresses an status.nodeRef populated. Executing ./12_csr_hack.sh explicitly fixes the problem.

The issue was originally tracked at https://github.com/openshift-metal3/dev-scripts/issues/260 and should be fixed by https://github.com/metal3-io/cluster-api-provider-baremetal/issues/49. It seems the issue started to re-occur.

To Reproduce Steps to reproduce the behavior, please include all details about the dev-scripts version (git commit SHA), any local variable overrides or other customizations, and whether you're deploying on VMs or Baremetal.

Recent dev-scripts master (34d334f22d77bf441383403453ca5225c28dcf7c), default config, only NUM_WORKERS and PULL_SECRET set.

Deploy cluster using default 'make' command. Check one of the master machine resources, the status.addresses and status.nodeRef is not populated.

Expected/observed behavior The master machines and nodes should reference each other correctly.

russellb commented 4 years ago

This has never been fixed, because it relies on having introspection data for masters. #260 was closed because this particular issue doesn't affect CSR approval. CSR approval for masters happens automatically at install time, and renewal of those certs was fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1737611

The issue is because the nodelink-controller in the machine-api-operator can not link the Machine and Node objects because the Machine is lacking the IP addresses that must match the IP addresses on the Node. For workers, this is done using introspection data. We don't have that data for masters (or at least we don't add it to BareMetalHosts during install).

https://github.com/openshift/machine-api-operator/blob/master/pkg/controller/nodelink/nodelink_controller.go

dev-scripts probably isn't the best place to track this anymore, since it's not a problem in dev-scripts itself. Some possible places to file a bug could be ...

@stbenjam @hardys any thoughts on how you'd like to track this?

hardys commented 4 years ago

I think the correct fix for this is to add continuous introspection, but is there any way for us to patch the BMH objects created in the installer with the initial introspection data?

IIRC we discussed that some time ago and it wasn't possible, so we disabled introspection altogether for masters but if that is no longer the case I guess an installer fix would potentially be quicker than chasing down the various moving parts to enable continuous introspection?

imain commented 4 years ago

I'm working on this now by adding BMH update in the installer.

imain commented 4 years ago

The above PR merged. This should be fixed now.

stbenjam commented 4 years ago

Indeed it is!

/close

openshift-ci-robot commented 4 years ago

@stbenjam: Closing this issue.

In response to [this](https://github.com/openshift-metal3/dev-scripts/issues/917#issuecomment-630982181): >Indeed it is! > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.