okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.74k stars 295 forks source link

OpenStack machines show as Provisioned #567

Closed jdhirst closed 3 years ago

jdhirst commented 3 years ago

Describe the bug Machines show as Provisioned rather than Running.

NAMESPACE               NAME                       PHASE         TYPE        REGION      ZONE   AGE
openshift-machine-api   okd-k6mbn-master-0         Provisioned   m2.xlarge   regionOne   nova   22h
openshift-machine-api   okd-k6mbn-master-1         Provisioned   m2.xlarge   regionOne   nova   22h
openshift-machine-api   okd-k6mbn-master-2         Provisioned   m2.xlarge   regionOne   nova   22h
openshift-machine-api   okd-k6mbn-worker-0-48ntz   Running       m2.xlarge   regionOne   nova   20h
openshift-machine-api   okd-k6mbn-worker-0-nllgb   Running       m2.xlarge   regionOne   nova   21h
openshift-machine-api   okd-k6mbn-worker-0-vpppl   Running       m2.xlarge   regionOne   nova   20h

However, all nodes are accounted for in node list:

NAME                       STATUS   ROLES    AGE   VERSION
okd-k6mbn-master-0         Ready    master   21h   v1.20.0+5fbfd19-1046
okd-k6mbn-master-1         Ready    master   21h   v1.20.0+5fbfd19-1046
okd-k6mbn-master-2         Ready    master   21h   v1.20.0+5fbfd19-1046
okd-k6mbn-worker-0-48ntz   Ready    worker   20h   v1.20.0+5fbfd19-1046
okd-k6mbn-worker-0-nllgb   Ready    worker   21h   v1.20.0+5fbfd19-1046
okd-k6mbn-worker-0-vpppl   Ready    worker   20h   v1.20.0+5fbfd19-1046

SSH works to all nodes as well.

The following alarms are raised:

machine okd-k6mbn-master-0 is in phase: Provisioned
machine okd-k6mbn-master-1 is in phase: Provisioned
machine okd-k6mbn-master-2 is in phase: Provisioned
machine okd-k6mbn-master-0 does not have valid node reference
machine okd-k6mbn-master-1 does not have valid node reference
machine okd-k6mbn-master-2 does not have valid node reference

Version 4.7.0-0.okd-2021-03-07-090821

How reproducible Not sure, have only seen on this cluster and it seems to only affect my master nodes.

Log bundle https://mega.nz/file/SOg2hKJR#Q7jWvH49TndJojZuzZ0RfRp4S2qDBZ4vN4qcK42Jqo0

vrutkovs commented 3 years ago
nodelink_controller.go:453] Found internal IP for node "okd-k6mbn-master-0": "10.0.3.2"
nodelink_controller.go:477] Matching machine not found for node "okd-k6mbn-master-0" with internal IP "10.0.3.2"

as node registers with:

  addresses:
  - address: 10.0.3.2
    type: InternalIP
  - address: okd-k6mbn-master-0
    type: Hostname
  - address: 172.16.19.46
    type: ExternalIP

so InternalIP is correct, but machine has:

status:
  addresses:
  - address: 172.16.19.46
    type: InternalIP
  - address: okd-k6mbn-master-0
    type: Hostname
  - address: okd-k6mbn-master-0
    type: InternalDNS

so machine's InternalIP doesn't match node's InternalIP.

Workers are registered correctly, as both machine and node have:

  addresses:
  - address: 10.0.2.40
    type: InternalIP
jdhirst commented 3 years ago

That's odd; why is the node seeing the external FIP as the internal IP? I added FIPs so I can SSH into the nodes and for debugging purposes.

jdhirst commented 3 years ago

Is there a solution to this? How does MCO retrieve the internal IP from OpenStack?

openshift-bot commented 3 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

jdhirst commented 3 years ago

Closing as I haven't been able to reproduce this