auto-healer: node MachineID is not always Openstack instance UUID

claudiubelu commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

Currently, the magnum auto-healer works with the assumption that the node's MachineID == Openstack Instance UUID. However, that is not always the case, not all hypervisors pass the instance.uuid to the VM's SMBIOS as libvirt does, and not all OSes rely on the MachineID / /etc/machine-id (e.g.: Windows VMs).

The OpenStack instance UUID can also be found in the Node's ProviderID with the format:

openstack:///openstack-uuid

We can use that instead. Additionally, OCCM already uses the ProviderID for the same reason.

What you expected to happen:

magnum-auto-healer should also work for Windows nodes and nodes that don't rely on MachineID / /etc/machine-id.

How to reproduce it:

Trigger the auto-healing process for a Windows node.

Anything else we need to know?:

Environment:

openstack-cloud-controller-manager(or other related binary) version:
OpenStack version:
Others:

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

lingxiankong commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/1278#issuecomment-873573938): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / cloud-provider-openstack

auto-healer: node MachineID is not always Openstack instance UUID #1278