4.5 -> 4.6, api-int resolution issues due to nsswitch change in Fedora 33

fortinj66 commented 3 years ago

Describe the bug Upgrade from 4.5 to 4.6 hangs. First master and first worker never finish.

Version

Version: Migration from: 4.5.0-0.okd-2020-10-15-235428 to: 4.6.0-0.okd-2020-11-27-200126 Method: IPI Platform: VMWare

Details

Upon running the upgrade, after the first master and worker are restarted, they stay at NotReady,SchedulingDisabled

 oc get nodes -o wide
NAME                          STATUS                        ROLES    AGE     VERSION                     INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION           CONTAINER-RUNTIME
dev-c1v4-mfpr9-master-0       Ready                         master   5d18h   v1.18.3                     10.102.5.64   10.102.5.64   Fedora CoreOS 32.20200629.3.0    5.6.19-300.fc32.x86_64   cri-o://1.18.2
dev-c1v4-mfpr9-master-1       NotReady,SchedulingDisabled   master   5d18h   v1.19.0-rc.2+9f84db3-1075   10.102.5.65   10.102.5.65   Fedora CoreOS 33.20201124.10.1   5.9.9-200.fc33.x86_64    cri-o://1.19.0
dev-c1v4-mfpr9-master-2       Ready                         master   5d18h   v1.18.3                     10.102.5.63   10.102.5.63   Fedora CoreOS 32.20200629.3.0    5.6.19-300.fc32.x86_64   cri-o://1.18.2
dev-c1v4-mfpr9-worker-jmk8q   NotReady,SchedulingDisabled   worker   5d18h   v1.18.3                     10.102.5.66   10.102.5.66   Fedora CoreOS 32.20200629.3.0    5.6.19-300.fc32.x86_64   cri-o://1.18.2
dev-c1v4-mfpr9-worker-rw6pt   Ready                         worker   5d17h   v1.18.3                     10.102.5.67   10.102.5.67   Fedora CoreOS 32.20200629.3.0    5.6.19-300.fc32.x86_64   cri-o://1.18.2
dev-c1v4-mfpr9-worker-xbmgg   Ready                         worker   5d2h    v1.18.3                     10.102.5.68   10.102.5.68   Fedora CoreOS 32.20200629.3.0    5.6.19-300.fc32.x86_64   cri-o://1.18.2

Looking at the journalctl on the master I see lots of lookup issues:

Nov 30 17:03:08 dev-c1v4-mfpr9-master-1 hyperkube[1709]: E1130 17:03:08.520121    1709 kubelet.go:2190] node "dev-c1v4-mfpr9-master-1" not found
Nov 30 17:03:08 dev-c1v4-mfpr9-master-1 hyperkube[1709]: I1130 17:03:08.533365    1709 csi_plugin.go:994] Failed to contact API server when waiting for CSINode publishing: Get "https://api-int.dev-c1v4.os.maeagle.corp:6443/apis/storage.k8s.io/v1/csinodes/dev-c1v4-mfpr9-master-1": dial tcp: lookup api-int.dev-c1v4.os.maeagle.corp: no such host
Nov 30 17:03:08 dev-c1v4-mfpr9-master-1 hyperkube[1709]: E1130 17:03:08.620272    1709 kubelet.go:2190] node "dev-c1v4-mfpr9-master-1" not found
Nov 30 17:03:08 dev-c1v4-mfpr9-master-1 hyperkube[1709]: E1130 17:03:08.720432    1709 kubelet.go:2190] node "dev-c1v4-mfpr9-master-1" not found

I see the same on the worker.

Log bundle Unfortunately I can't seem to run must-gather

oc adm must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:c5b27546b5bb33e0af0bdd7610a0f19075bb68c78f39233db743671b9f043f6b
[must-gather      ] OUT namespace/openshift-must-gather-6q9bh created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-9z42b created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:c5b27546b5bb33e0af0bdd7610a0f19075bb68c78f39233db743671b9f043f6b created
[must-gather-frnqv] OUT gather did not start: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-9z42b deleted
[must-gather      ] OUT namespace/openshift-must-gather-6q9bh deleted
error: gather did not start for pod must-gather-frnqv: timed out waiting for the condition

fortinj66 commented 3 years ago

I was able to create a must-gather file: https://www.dropbox.com/s/qnq4758gabu0yg6/must-gather.tar.gz?dl=0

There were a fair amount of errors:
must-gather.log

fortinj66 commented 3 years ago

so for some reason I had to add the api-int.xxx host ip to /etc/hosts. Once I did that the updates finished.

I add 10.102.5.2 api-int.dev-c1v4.os.maeagle.corp to each master and worker

Here is the final /etc/hosts after reboot...

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.102.5.2 api-int.dev-c1v4.os.maeagle.corp

;; image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
Connection image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
to image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
172.30.0.10#53(172.30.0.10) image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
for image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
image-registry.openshift-image-registry.svc.cluster.local image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
failed: image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
timed image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
out. image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver

what is all the stuff after ;; ?

fortinj66 commented 3 years ago

Cluster status after update:

[root@os-utils-d02 ~]# oc get nodes
NAME                          STATUS   ROLES    AGE     VERSION
dev-c1v4-mfpr9-master-0       Ready    master   5d21h   v1.19.0-rc.2+9f84db3-1075
dev-c1v4-mfpr9-master-1       Ready    master   5d21h   v1.19.0-rc.2+9f84db3-1075
dev-c1v4-mfpr9-master-2       Ready    master   5d21h   v1.19.0-rc.2+9f84db3-1075
dev-c1v4-mfpr9-worker-jmk8q   Ready    worker   5d21h   v1.19.0-rc.2+9f84db3-1075
dev-c1v4-mfpr9-worker-rw6pt   Ready    worker   5d21h   v1.19.0-rc.2+9f84db3-1075
dev-c1v4-mfpr9-worker-xbmgg   Ready    worker   5d5h    v1.19.0-rc.2+9f84db3-1075

vrutkovs commented 3 years ago

Check /etc/resolv.conf - NM should have prepended an internal DNS, which would resolve api-int

fortinj66 commented 3 years ago

It was added:

after removing from /etc/hosts I get:

# Generated by KNI resolv prepender NM dispatcher script
search dev-c1v4.os.maeagle.corp
nameserver 10.102.5.67
nameserver 10.99.111.1
nameserver 10.99.111.2

[core@dev-c1v4-mfpr9-worker-rw6pt ~]$ nslookup api-int.dev-c1v4.os.maeagle.corp
;; Got recursion not available from 10.102.5.67, trying next server
Server:         10.99.111.1
Address:        10.99.111.1#53

** server can't find api-int.dev-c1v4.os.maeagle.corp: NXDOMAIN

notice the Got recursion not available from 10.102.5.67, trying next server

fortinj66 commented 3 years ago

I don't think internal DNS is working correctly. I can't seen to resolve any cluster.local entries either

bobby0724 commented 3 years ago

great info thanks for sharing i will stay on 4.5 until 4.6 is more stable

vrutkovs commented 3 years ago

notice the Got recursion not available from 10.102.5.67, trying next server

Two DNS pods are unhealthy:

| dns-default-d475t | Unhealthy | Liveness probe failed: Get http://10.128.0.7:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

dns-default-d475t Unhealthy Liveness probe failed: Get http://10.128.0.7:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

But other than that internal DNS looks fine

fortinj66 commented 3 years ago

I restarted all the DNS pods but it didn't make a difference...

I was able to resolve internal cluster.local addresses and the external api DNS.

sh-4.4# nslookup router-internal-default.openshift-ingress
Server:         172.30.0.10
Address:        172.30.0.10#53

Name:   router-internal-default.openshift-ingress.svc.cluster.local
Address: 172.30.228.0

sh-4.4# nslookup api.dev-c1v4.os.maeagle.corp
Server:         172.30.0.10
Address:        172.30.0.10#53

Name:   api.dev-c1v4.os.maeagle.corp
Address: 10.102.5.2

but api-int.dev-c1v4.os.maeagle.corp still fails:

sh-4.4# nslookup api-int.dev-c1v4.os.maeagle.corp
Server:         172.30.0.10
Address:        172.30.0.10#53

Name:   api-int.dev-c1v4.os.maeagle.corp
Address: 10.102.5.2
Name:   api-int.dev-c1v4.os.maeagle.corp
Address: 10.102.5.2
** server can't find api-int.dev-c1v4.os.maeagle.corp: NXDOMAIN

Other than this issue, the upgrade seems fine so far...

I'm going to rebuild the cluster and see if I can reproduce the issue.

fortinj66 commented 3 years ago

I'd say there is definately an issue during the upgrade. Even before the nodes are rebooted, the api-int becomes unavailable:

This is a fresh 4.5 install with an upgrade to 4.6

Before on master-0:

sh-4.4# cat /etc/resolv.conf 
search dev-c1v4.os.maeagle.corp
nameserver 10.102.5.51
nameserver 10.99.111.1
nameserver 10.99.111.2
sh-4.4# cat /etc/hosts       
# Kubernetes-managed hosts file (host network).
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
sh-4.4#

sh-4.2# nslookup api-int.dev-c1v4.os.maeagle.corp
Server:         10.102.5.51
Address:        10.102.5.51#53

Name:   api-int.dev-c1v4.os.maeagle.corp
Address: 10.102.5.2
** server can't find api-int.dev-c1v4.os.maeagle.corp: NXDOMAIN

sh-4.2# nslookup api.dev-c1v4.os.maeagle.corp
Server:         10.102.5.51
Address:        10.102.5.51#53

Name:   api.dev-c1v4.os.maeagle.corp
Address: 10.102.5.2

After

sh-4.4#  nslookup api-int.dev-c1v4.os.maeagle.corp
;; Got recursion not available from 10.102.5.51, trying next server
Server:         10.99.111.1
Address:        10.99.111.1#53

sh-4.4#  nslookup api.dev-c1v4.os.maeagle.corp
;; Got recursion not available from 10.102.5.51, trying next server
Server:         10.99.111.1
Address:        10.99.111.1#53

Name:   api.dev-c1v4.os.maeagle.corp
Address: 10.102.5.2

and DNS had not been updated yet: dns 4.5.0-0.okd-2020-10-15-235428 True False False 60m

so during the upgrade, something broke...

fortinj66 commented 3 years ago

Current Status... It will stay here until I add the api-int IP into the /etc/hosts

NAME                          STATUS                        ROLES    AGE   VERSION
dev-c1v4-hml4m-master-0       NotReady,SchedulingDisabled   master   85m   v1.18.3
dev-c1v4-hml4m-master-1       Ready                         master   85m   v1.18.3
dev-c1v4-hml4m-master-2       Ready                         master   85m   v1.18.3
dev-c1v4-hml4m-worker-8rrgs   NotReady,SchedulingDisabled   worker   72m   v1.18.3
dev-c1v4-hml4m-worker-pl97s   Ready                         worker   72m   v1.18.3
dev-c1v4-hml4m-worker-sf74s   Ready                         worker   72m   v1.18.3

Every 15.0s: oc get co                                                                        Tue Dec  1 11:05:48 2020

NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.okd-2020-11-27-200126   False       False         True       2m10s
cloud-credential                           4.6.0-0.okd-2020-11-27-200126   True        False         False      87m
cluster-autoscaler                         4.6.0-0.okd-2020-11-27-200126   True        False         False      77m
config-operator                            4.6.0-0.okd-2020-11-27-200126   True        False         False      77m
console                                    4.6.0-0.okd-2020-11-27-200126   True        False         False      6m38s
csi-snapshot-controller                    4.6.0-0.okd-2020-11-27-200126   True        False         False      79m
dns                                        4.6.0-0.okd-2020-11-27-200126   True        False         True       17m
etcd                                       4.6.0-0.okd-2020-11-27-200126   True        False         False      81m
image-registry                             4.6.0-0.okd-2020-11-27-200126   True        False         False      79m
ingress                                    4.6.0-0.okd-2020-11-27-200126   True        False         False      34m
insights                                   4.6.0-0.okd-2020-11-27-200126   True        False         False      79m
kube-apiserver                             4.6.0-0.okd-2020-11-27-200126   True        False         False      80m
kube-controller-manager                    4.6.0-0.okd-2020-11-27-200126   True        False         False      80m
kube-scheduler                             4.6.0-0.okd-2020-11-27-200126   True        False         False      80m
kube-storage-version-migrator              4.6.0-0.okd-2020-11-27-200126   True        False         False      53m
machine-api                                4.6.0-0.okd-2020-11-27-200126   True        False         False      75m
machine-approver                           4.6.0-0.okd-2020-11-27-200126   True        False         False      79m
machine-config                             4.5.0-0.okd-2020-10-15-235428   False       True          False      17m
marketplace                                4.6.0-0.okd-2020-11-27-200126   True        False         False      33m
monitoring                                 4.6.0-0.okd-2020-11-27-200126   True        False         False      7m54s
network                                    4.6.0-0.okd-2020-11-27-200126   True        True          False      83m
node-tuning                                4.6.0-0.okd-2020-11-27-200126   True        False         False      34m
openshift-apiserver                        4.6.0-0.okd-2020-11-27-200126   True        False         True       12m
openshift-controller-manager               4.6.0-0.okd-2020-11-27-200126   True        False         False      79m
openshift-samples                          4.6.0-0.okd-2020-11-27-200126   True        False         False      34m
operator-lifecycle-manager                 4.6.0-0.okd-2020-11-27-200126   True        False         False      82m
operator-lifecycle-manager-catalog         4.6.0-0.okd-2020-11-27-200126   True        False         False      82m
operator-lifecycle-manager-packageserver   4.6.0-0.okd-2020-11-27-200126   True        False         False      6m34s
service-ca                                 4.6.0-0.okd-2020-11-27-200126   True        False         False      82m
storage                                    4.6.0-0.okd-2020-11-27-200126   True        False         False      34m

vrutkovs commented 3 years ago

dns (and subsequently openshift-apiserver) is degraded, I don't think I've seen it in the original must-gather - could you collect another one (may need a few times, but it if fails oc get op dns -o yaml would do for now)

fortinj66 commented 3 years ago

oc get op dns -o yaml

oc get op dns -o yaml error: the server doesn't have a resource type "op"

I think it is degraded because of the master and worker not reporting back in...

fortinj66 commented 3 years ago

must-gather link

vrutkovs commented 3 years ago

So two pods

dns-default-7rdgv

and

dns-default-glrjs

in not ready state with no logs. Not sure why pods are not ready - all containers are ready though.

fortinj66 commented 3 years ago

dns-default-7rdgv

dev-c1v4-hml4m-master-0  <Not Ready> so not reporting

dns-default-glrjs

dev-c1v4-hml4m-worker-8rrgs  <Not Ready> so not reporting

As I said, If I add the api-int entry to /etc/hosts it will resolve...

I mentioned earlier that DNS started failing before the update was complete and before DNS had been updated... Something else is breaking it... See the nslookup results above with the before and after

revresx commented 3 years ago

Failed for me as well on our dev cluster but it's a UPI (Master on VMware, Worker on bare metal). Hanging in the same state one Master and one Node will stay in not ready.

When logging in on to the failed master systemd reports:

[systemd]
Failed Units: 1
  gcp-hostname.service

-- Reboot --
Dez 01 18:07:00 fedora systemd[1]: Starting Set GCP Transient Hostname...
Dez 01 18:07:00 fedora afterburn[1052]: Dec 01 18:07:00.667 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #1
Dez 01 18:07:00 fedora afterburn[1052]: Dec 01 18:07:00.674 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:01 fedora afterburn[1052]: Dec 01 18:07:01.674 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #2
Dez 01 18:07:01 fedora afterburn[1052]: Dec 01 18:07:01.675 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:03 fedora afterburn[1052]: Dec 01 18:07:03.675 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #3
Dez 01 18:07:03 fedora afterburn[1052]: Dec 01 18:07:03.676 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:07 fedora afterburn[1052]: Dec 01 18:07:07.676 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #4
Dez 01 18:07:07 fedora afterburn[1052]: Dec 01 18:07:07.677 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:12 fedora afterburn[1052]: Dec 01 18:07:12.677 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #5
Dez 01 18:07:12 fedora afterburn[1052]: Dec 01 18:07:12.679 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:17 fedora afterburn[1052]: Dec 01 18:07:17.679 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #6
Dez 01 18:07:17 fedora afterburn[1052]: Dec 01 18:07:17.680 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:22 fedora afterburn[1052]: Dec 01 18:07:22.680 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #7
Dez 01 18:07:22 fedora afterburn[1052]: Dec 01 18:07:22.681 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:27 fedora afterburn[1052]: Dec 01 18:07:27.682 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #8
Dez 01 18:07:27 fedora afterburn[1052]: Dec 01 18:07:27.683 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:32 fedora afterburn[1052]: Dec 01 18:07:32.683 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #9
Dez 01 18:07:32 fedora afterburn[1052]: Dec 01 18:07:32.685 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:37 fedora afterburn[1052]: Dec 01 18:07:37.685 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #10
Dez 01 18:07:37 fedora afterburn[1052]: Dec 01 18:07:37.687 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:42 fedora afterburn[1052]: Dec 01 18:07:42.687 INFO Fetching http://metadata.google.internal/computeMetadata/v1/instance/hostname: Attempt #11
Dez 01 18:07:42 fedora afterburn[1052]: Dec 01 18:07:42.689 INFO Failed to fetch: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:42 fedora afterburn[1052]: Error: failed to run
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: writing hostname
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: maximum number of retries (10) reached
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: failed to fetch
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: error sending request for url (http://metadata.google.internal/computeMetadata/v1/instance/hostname): error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: error trying to connect: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: dns error: failed to lookup address information: Name or service not known
Dez 01 18:07:42 fedora afterburn[1052]: Caused by: failed to lookup address information: Name or service not known
Dez 01 18:07:42 fedora systemd[1]: gcp-hostname.service: Control process exited, code=exited, status=1/FAILURE
Dez 01 18:07:42 fedora systemd[1]: gcp-hostname.service: Failed with result 'exit-code'.
Dez 01 18:07:42 fedora systemd[1]: Failed to start Set GCP Transient Hostname.

and the hostname was changed from master01 to fedora

fortinj66 commented 3 years ago

and the hostname was changed from master01 to fedora

I haven't seen any hostname changes...

revresx commented 3 years ago

But i looks like if the issue has the same origin:

Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.868474    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.800840    1293 kubelet_volumes.go:154] orphaned pod "5a1e2d9b-52b1-44ff-b2e0-b139c23f52cc" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.768358    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: I1201 19:22:29.681173    1293 csi_plugin.go:994] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "fedora" is forbidden: User "system:node:master01" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope: can only access CSINode with t>
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.668254    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: I1201 19:22:29.626220    1293 worker.go:215] Non-running container probed: kube-apiserver-fedora_openshift-kube-apiserver(42774d6f-c001-41e0-ba24-2a6e4b02c8da) - kube-apiserver-check-endpoints
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.568145    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.468026    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.367882    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.267778    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.167639    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:29 fedora hyperkube[1293]: E1201 19:22:29.067529    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:28 fedora hyperkube[1293]: E1201 19:22:28.967264    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:28 fedora hyperkube[1293]: E1201 19:22:28.867153    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:28 fedora hyperkube[1293]: E1201 19:22:28.766628    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.680391    1293 csi_plugin.go:994] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "fedora" is forbidden: User "system:node:master01" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope: can only access CSINode with t>
Dez 01 19:22:28 fedora hyperkube[1293]: E1201 19:22:28.666370    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:28 fedora hyperkube[1293]: E1201 19:22:28.566136    1293 kubelet.go:2190] node "fedora" not found
Dez 01 19:22:28 fedora hyperkube[1293]: E1201 19:22:28.514858    1293 kubelet_node_status.go:92] Unable to register node "fedora" with API server: nodes "fedora" is forbidden: node "master01" is not allowed to modify node "fedora"
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.511279    1293 event.go:291] "Event occurred" object="fedora" kind="Node" apiVersion="" type="Normal" reason="NodeHasSufficientPID" message="Node fedora status is now: NodeHasSufficientPID"
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.511269    1293 event.go:291] "Event occurred" object="fedora" kind="Node" apiVersion="" type="Normal" reason="NodeHasNoDiskPressure" message="Node fedora status is now: NodeHasNoDiskPressure"
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.511249    1293 event.go:291] "Event occurred" object="fedora" kind="Node" apiVersion="" type="Normal" reason="NodeHasSufficientMemory" message="Node fedora status is now: NodeHasSufficientMemory"
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.511133    1293 kubelet_node_status.go:70] Attempting to register node fedora
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.511100    1293 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node fedora
Dez 01 19:22:28 fedora hyperkube[1293]: I1201 19:22:28.511090    1293 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node fedora

Is there a workaround for this? I could rename the host to its old name but if possible i would rather not risk our dev cluster because of try and error.

fortinj66 commented 3 years ago

Is there a workaround for this? I could rename the host to its old name but if possible i would rather not risk our dev cluster because of try and error.

I think your issue is slightly different since your nodes are being renamed...

are you able to do a DNS lookup of api-int. ?

revresx commented 3 years ago

Yes, maybe because of UPI/external DNS?

fortinj66 commented 3 years ago

Well, this is interesting... It looks like something may be broke in CoreOS fc33.

CoreDNS itself seems to be fine. The coredns pods run on each host as privilege container and act as a DNS server on that host IP on port 53. On both fc32 and fc33 the DNS itself works with dig and nslookup (nslookup needs -norec added)

However, after adding log to /etc/coreos/Corefile so that I can see better logs, there is a big difference. Applications do not seem to be able to resolve the api-int. on fc33 hosts. They can resolve other hosts (google.com, *.apps. etc)

FC32 from ping tests

[INFO] 10.102.5.75:33469 - 59898 "A IN api-int.dev-c1v4.os.maeagle.corp. udp 50 false 512" NOERROR qr,aa,rd 98 0.000242014s
[INFO] 10.102.5.75:33469 - 57280 "AAAA IN api-int.dev-c1v4.os.maeagle.corp. udp 50 false 512" NXDOMAIN qr,aa,rd,ra 144 0.000897943s
[INFO] 10.102.5.75:34459 - 58102 "PTR IN 2.5.102.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd 175 0.0001837s
[INFO] 10.102.5.75:55297 - 51746 "A IN dmc.maeagle.corp. udp 34 false 512" NOERROR qr,aa,rd,ra 120 0.000907337s
[INFO] 10.102.5.75:55297 - 54309 "AAAA IN dmc.maeagle.corp. udp 34 false 512" NOERROR qr,aa,rd,ra 84 0.000966276s
[INFO] 10.102.5.75:45755 - 1724 "PTR IN 60.154.199.10.in-addr.arpa. udp 44 false 512" NOERROR qr,aa,rd,ra 104 0.001262644s

FC33 from ping tests:

[root@dev-c1v4-ggwfp-worker-s55lj ~]# crictl logs -f 9c81f376d81e5 | grep -v plugin/mdns
.:53
[INFO] plugin/reload: Running configuration MD5 = 5cc99ff6925ba52e9e17689f1616cba7
CoreDNS-1.6.6
linux/amd64, go1.15.0,

FC32 dnslookup tests


[INFO] 10.102.5.75:45943 - 32870 "A IN api-int.dev-c1v4.os.maeagle.corp. udp 50 false 512" NOERROR qr,aa,rd 98 0.00022074s
[INFO] 10.129.2.3:41521 - 21814 "A IN oauth-openshift.apps.dev-c1v4.os.maeagle.corp. udp 63 false 512" NOERROR qr,aa,rd 124 0.000258591s
[INFO] 10.102.5.75:41300 - 3757 "A IN dmc.maeagle.corp. udp 34 false 512" NOERROR qr,aa,rd,ra 120 0.001293433s
[INFO] 10.102.5.75:42856 - 30833 "AAAA IN dmc-p03.maeagle.corp. udp 38 false 512" NOERROR qr,aa,rd,ra 124 0.00116467s

FC33 dnslookup tests

[INFO] plugin/reload: Running configuration MD5 = 5cc99ff6925ba52e9e17689f1616cba7
CoreDNS-1.6.6
linux/amd64, go1.15.0,
[INFO] 10.102.5.74:39182 - 57890 "A IN api.dev-c1v4.os.maeagle.corp. udp 46 false 512" NOERROR qr,aa 90 0.00020603s
[INFO] 10.102.5.74:60059 - 20225 "AAAA IN api.dev-c1v4.os.maeagle.corp. udp 46 false 512" NOERROR qr,aa,ra 140 0.001273169s
[INFO] 10.102.5.74:41967 - 19272 "A IN api-int.dev-c1v4.os.maeagle.corp. udp 50 false 512" NOERROR qr,aa 98 0.000161355s
[INFO] 10.102.5.74:57526 - 62586 "AAAA IN api-int.dev-c1v4.os.maeagle.corp. udp 50 false 512" NXDOMAIN qr,aa,ra 144 0.001409065s

As far as I can tell the /etc/coredns/Corefiles are identical...

. {
    log
    errors
    health :18080
    mdns dev-c1v4.os.maeagle.corp 0 dev-c1v4
    forward . 10.99.111.1 10.99.111.2
    cache 30
    reload
    hosts {
        10.102.5.2 api-int.dev-c1v4.os.maeagle.corp
        10.102.5.2 api.dev-c1v4.os.maeagle.corp
        fallthrough
    }
    template IN A dev-c1v4.os.maeagle.corp {
        match .*.apps.dev-c1v4.os.maeagle.corp
        answer "{{ .Name }} 60 in a 10.102.5.3"
        fallthrough
    }
}

I'm not exactly sure what it means but it definately seems broken

fortinj66 commented 3 years ago

Well, I think I found the issue:

FC32 from /etc/nsswitch.conf: hosts: files dns myhostname

FC33 from /etc/nsswitch.conf: hosts: files resolve [!UNAVAIL=return] myhostname dns

If I change FC33 /etc/nsswitch.conf to use hosts: files dns myhostname
I can ping api-int.

After rebooting, the nodes come up fine...

I'm not sure if this is an install issue or due to changes to FC33

vrutkovs commented 3 years ago

Marked "node hostname changes to fedora" comments as offtopic, please use https://github.com/openshift/okd/issues/394 for that.

If I change FC33 /etc/nsswitch.conf to use hosts: files dns myhostname

Oh, interesting. Renaming the ticket

LorbusChris commented 3 years ago

This is probably due to this change: https://fedoraproject.org/wiki/Changes/systemd-resolved and possibly also related to https://github.com/coreos/fedora-coreos-tracker/issues/679

cc'ing @dustymabe for awareness

LorbusChris commented 3 years ago

From the change summary linked above: glibc will perform name resolution using nss-resolve rather than nss-dns

@mcatanzaro @keszybz maybe you can help here. What has to change so that nsswitch.conf is automatically configured correctly for the resolution to work?

fortinj66 commented 3 years ago

Let me know how I can help/test.

I'm going to test regular 4.6 IPI install now...

mcatanzaro commented 3 years ago

maybe you can help here. What has to change so that nsswitch.conf is automatically configured correctly for the resolution to work?

What you have currently, hosts: files resolve [!UNAVAIL=return] myhostname dns, looks perfect to me. That's almost the same as Fedora's new default, except without mdns (avahi), which you probably don't want in okd anyway. If you want to go back to legacy DNS behavior, you would remove resolve [!UNAVAIL=return]. I think your problem is that you have custom DNS servers listed in /etc/resolv.conf that are required for the cluster to work, but systemd-resolved has not been told about them, yes? /etc/resolv.conf is now a legacy file that is ignored. Software that thinks it can write to it is broken.

You have several options to fix this.

Normally you're supposed to use the systemd-resolved D-Bus API for this instead. That's intended to be used by network management software, like NetworkManager or third-party VPN clients. You would call SetLinkDNS(), but the timing is a bit tricky: "Network management software integrating with resolved is recommended to invoke this method (and the five below) after the interface appeared in the kernel (and thus after a network interface index has been assigned) but before the network interfaces is activated (set IFF_UP on) so that all settings take effect during the full time the network interface is up." So I'm not sure if that would work well for okd, but that is the ideal solution IMO.

Alternatively, since (I think) you want a very simple configuration with a static list of DNS servers used for every interface, you could ignore the usual per-interface configuration and just set some global DNS servers instead, using the DNS= line in /etc/systemd/resolved.conf. But if so, you need to make sure that you don't have network management software (like NetworkManager) assigning any per-link configuration using SetLinkDNS(), because the global configuration will be ignored if so. In Fedora, the global configuration is ignored by default because NetworkManager will always set link-specific configuration. This might be the best solution for okd if you don't have NetworkManager or any other network management software that knows about systemd-resolved.

If neither of those options sounds suitable, then the next thing I would try is configuring systemd-resolved to read rather than manage /etc/resolv.conf. By default -- what you have right now -- systemd-resolved manages /etc/resolv.conf for you, and writes to it will not work as expected. But you can change this by changing it to be a normal file rather than a symlink. If so, systemd-resolved should populate its configuration from what you have in /etc/resolv.conf. This might be a good option if you have trouble with either of the previous approaches, since it would allow you to continue writing to /etc/resolv.conf. You lose the benefits of systemd-resolved's rich DNS configuration, but those are probably only important on desktops, not so useful for okd, so it's not much loss. But you would still benefit from systemd-resolved's shared DNS cache.

Finally, you could seriously decide to disable systemd-resolved and stick with legacy DNS. I think it's worth using systemd-resolved even if only for its DNS cache, but if it's too hard to get working, I wouldn't worry too much about it. You can revert back to your old nsswitch.conf, disable the systemd-resolved service, and move on with life. I you need DNSSEC to work, then you must disable systemd-resolved, at least for now, because DNSSEC is pretty broken and not likely to be fixed until Fedora 34. I hope that will be fixed in time for F34.

mcatanzaro commented 3 years ago

Normally you're supposed to use the systemd-resolved D-Bus API for this instead. That's intended to be used by network management software, like NetworkManager or third-party VPN clients. You would call SetLinkDNS(), but the timing is a bit tricky: "Network management software integrating with resolved is recommended to invoke this method (and the five below) after the interface appeared in the kernel (and thus after a network interface index has been assigned) but before the network interfaces is activated (set IFF_UP on) so that all settings take effect during the full time the network interface is up." So I'm not sure if that would work well for okd, but that is the ideal solution IMO.

I should clarify: that's the ideal solution if you don't have NetworkManager.

If you do have NetworkManager, then I forgot to mention the obvious/ideal solution: just configure your extra DNS servers with NetworkManager (either using nmcli, or the NetworkManager D-Bus API)! Then everything should work, because NetworkManager will handle configuring systemd-resolved for you, and it will do so at exactly the right time, for every network interface. (It sounds like your software is currently writing directly to /etc/resolv.conf, and then that breaks because NetworkManager doesn't know about it?)

LorbusChris commented 3 years ago

@mcatanzaro thank you for that excellent write-up! We do have NM, so I'll investigate what you suggested here.

We also hit https://github.com/coreos/fedora-coreos-tracker/issues/679 which might be related here but I'm not entirely sure - tl;dr of what I think happens there is this (quoting myself from over there):

tmpfiles will create the /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf symlink at boot only (per L!), but unconditionally at that time.
systemd-resolved will try to create the symlink itself, but only if DNSStubListener=no is not set. If the symlink is already there however, it barfs: systemd-resolved[894]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied

(reading that summary again, that might actually be a separate issue)

vrutkovs commented 3 years ago

systemd-resolved has not been told about them, yes? /etc/resolv.conf is now a legacy file that is ignored

Not in OKD, we still need it. We disable DNS Stub so that kubelet (and NM) could read /etc/resolv.conf

mcatanzaro commented 3 years ago

Hmm... I won't comment on the specifics of your migration -- I understand that's harder than normal Fedora since RPM scriptlets won't work in CoreOS -- but since you configure DNSStubListener=no, then you have already implemented this solution:

If neither of those options sounds suitable, then the next thing I would try is configuring systemd-resolved to read rather than manage /etc/resolv.conf. By default -- what you have right now -- systemd-resolved manages /etc/resolv.conf for you, and writes to it will not work as expected. But you can change this by changing it to be a normal file rather than a symlink. If so, systemd-resolved should populate its configuration from what you have in /etc/resolv.conf. This might be a good option if you have trouble with either of the previous approaches, since it would allow you to continue writing to /etc/resolv.conf. You lose the benefits of systemd-resolved's rich DNS configuration, but those are probably only important on desktops, not so useful for okd, so it's not much loss. But you would still benefit from systemd-resolved's shared DNS cache.

I don't think that's the best option, because you are using NetworkManager -- why not just let it do its thing? it should work -- but it's an acceptable choice. In theory, I think this ought to work without you having to make any changes to /etc/nsswitch.conf. But since it seems to not be working, I would recommend the practical solution: remove resolve [!UNAVAIL=return] from the hosts line in /etc/nsswitch.conf, and move on.

P.S. Obligatory reminder: if you have authselect -- you probably do? -- don't modify /etc/nsswitch.conf directly, because that breaks authselect. Modify /etc/authselect/user-nsswitch.conf instead, then run authselect apply-changes.

mcatanzaro commented 3 years ago

I would recommend the practical solution: remove resolve [!UNAVAIL=return] from the hosts line in /etc/nsswitch.conf, and move on.

Well, if doing that, also disable systemd-resolved.service. It's not going to get used, since I can see in https://github.com/openshift/okd/issues/401#issuecomment-736041606 that you don't have the stub resolver listed in /etc/nsswitch.conf.

LorbusChris commented 3 years ago

OK let me remove DNSStubListener=no and see if that works: https://github.com/openshift/okd-machine-os/pull/22

LorbusChris commented 3 years ago

Hmm, OK the reason this was introduced in the first place was resolved causing CoreDNS to crashloop: https://github.com/coredns/coredns/blob/master/plugin/loop/README.md#troubleshooting-loops-in-kubernetes-clusters

LorbusChris commented 3 years ago

@vrutkovs I think we should attempt to fix this by either A: preventing resolved from putting local/loopback address in /etc/resolve.conf, or B: configuring kubelet to use /run/systemd/resolve/resolv.conf instead of /etc/resolve.conf (as suggested in the coredns doc linked above)

mcatanzaro commented 3 years ago

OK let me remove DNSStubListener=no and see if that works: openshift/okd-machine-os#22

It should work as long as NetworkManager is configured to use your CoreDNS server for each needed network interface. (If NetworkManager doesn't know about CoreDNS, of course it can't work.)

@vrutkovs I think we should attempt to fix this by either A: preventing resolved from putting local/loopback address in /etc/resolve.conf, or B: configuring kubelet to use /run/systemd/resolve/resolv.conf instead of /etc/resolve.conf (as suggested in the coredns doc linked above)

A: corresponds to using DNSStubListener=no. That should work. I think B: should also work.

Either way, you're causing DNS lookups to totally bypass systemd-resolved, which leads me to question whether you really want it enabled at all.

vrutkovs commented 3 years ago

B: configuring kubelet to use /run/systemd/resolve/resolv.conf instead of /etc/resolve.conf

That's probably the best solution, yes, but it would require a conditional change in MCO, which really hard to test and support.

Lets see if option A (implemented in https://github.com/openshift/okd-machine-os/pull/20) would work

cgwalters commented 3 years ago

OK a whole lot going on here. First a clear discovery is that it's a really evil trap that systemctl disable systemd-resolved doesn't work - one must also disable the NSS module. This seems to also impact privileged containers (which have a separate copy of /etc/nsswitch.conf).

Second, the api-int issue intersects with https://github.com/openshift/machine-config-operator/pull/2236

keszybz commented 3 years ago

it's a really evil trap that systemctl disable systemd-resolved doesn't work - one must also disable the NSS module

nss-resolve by itself doesn't do anything if systemd-resolved is not running. (Though the opposite is not true.) So it should be OK to leave the nss module.

mcatanzaro commented 3 years ago

nss-resolve by itself doesn't do anything if systemd-resolved is not running. (Though the opposite is not true.) So it should be OK to leave the nss module.

Yeah that's what the [!UNAVAIL=return] is for. So in theory you should be able to leave nsswitch.conf alone and just systemctl disable it.

Sans bugs.

fortinj66 commented 3 years ago

When do we expect a new release for the fixes for the hostname ('fedora') and dns (nsswitch.conf) issues for testing?

I'm itching to test :)

vrutkovs commented 3 years ago

nsswitch.conf would be added in https://github.com/openshift/okd-machine-os/pull/20, not quite sure how to approach hostname issue - iiuc we need a new NetworkManager RPM in FCOS next-devel?

I'll check for a better way to overlay additional RPMs in OKD content until it lands in Fedora

fortinj66 commented 3 years ago

not quite sure how to approach hostname issue - iiuc we need a new NetworkManager RPM in FCOS next-devel?

I believe there are two hostname issues.

One is in reference to issues with DCHP resolved hostname which require (?) an NM fix... https://github.com/openshift/okd/issues/394

The other, the one I am having issues with is regarding: https://github.com/openshift/machine-config-operator/pull/2282 which pulls the hostname from vSphere.

LorbusChris commented 3 years ago

I think one more issue here is that in nsswitch.conf, the dns entry now as of F33 has lower priority than myhostname on hosts. It used to be:

hosts:      files dns myhostname

It is now:

hosts:      files resolve [!UNAVAIL=return] myhostname dns

while it should be:

hosts:      files resolve [!UNAVAIL=return] dns myhostname

fortinj66 commented 3 years ago

while it should be:

hosts:      files resolve [!UNAVAIL=return] dns myhostname

isn't this being fixed with https://github.com/openshift/okd-machine-os/pull/20

LorbusChris commented 3 years ago

@fortinj66 yes, but that's only an OKD specific workaround of the real issue

We really don't want the overlay files unless we absolutely have to.

fortinj66 commented 3 years ago

Gotcha... I'm still feeling my way around the various bits and pieces here so some of my questions may be off the mark...

vrutkovs commented 3 years ago

Included this change in https://amd64.origin.releases.ci.openshift.org/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-12-04-104706.

Keeping open until we ensure this change is required and does the trick

fortinj66 commented 3 years ago

This seems to work. All the master servers were able to reach api-int...

The hostname issue remains as they are all 'fedora'

okd-project / okd

4.5 -> 4.6, api-int resolution issues due to nsswitch change in Fedora 33 #401