When a single master goes down the api is no longer available (virt)

gklein commented 5 years ago

Describe the bug When a single master goes down the api is no longer available (virt) To Reproduce

# openstack baremetal node list
+--------------------------------------+--------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name               | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------------+--------------------------------------+-------------+--------------------+-------------+
| aa74ac70-e0b9-48d0-96cc-2f4d7115c490 | openshift-master-0 | None                                 | power on    | active             | False       |
| 16b3ea48-76ce-4137-bb60-7fecb1ebb810 | openshift-master-1 | None                                 | power on    | active             | False       |
| 67577322-a339-4dcc-be1d-bb655e4d01c4 | openshift-master-2 | None                                 | power on    | active             | False       |
| 48b84883-0390-4a24-b887-ea77c07ab0e1 | openshift-worker-2 | 48b84883-0390-4a24-b887-ea77c07ab0e1 | power on    | active             | False       |
| ebcf043a-38b6-4f15-9e3b-c603e9d4773c | openshift-worker-0 | ebcf043a-38b6-4f15-9e3b-c603e9d4773c | power on    | active             | False       |
| b0f2cd0a-5597-45fb-8232-c10110e66e48 | openshift-worker-1 | b0f2cd0a-5597-45fb-8232-c10110e66e48 | power on    | active             | False       |
+--------------------------------------+--------------------+--------------------------------------+-------------+--------------------+-------------+

# openstack baremetal node power off  openshift-master-0

# oc get nodes
Unable to connect to the server: EOF

# curl -k "https://192.168.111.5:6443"
curl: (35) Encountered end of file

$ ssh core@192.168.111.21 ip a | grep 192
    inet 192.168.111.21/24 brd 192.168.111.255 scope global dynamic noprefixroute ens4
[kni@titan96 ~]$ ssh core@192.168.111.21 ip a | grep 192.168.111
    inet 192.168.111.21/24 brd 192.168.111.255 scope global dynamic noprefixroute ens4
[kni@titan96 ~]$ ssh core@192.168.111.22 ip a | grep 192.168.111
    inet 192.168.111.22/24 brd 192.168.111.255 scope global dynamic noprefixroute ens4
    inet 192.168.111.4/24 scope global secondary ens4
    inet 192.168.111.2/24 scope global secondary ens4
    inet 192.168.111.5/24 scope global secondary ens4

Expected/observed behavior The api vip should still be available with 2 masters

gklein commented 5 years ago

Logs from master-2:

# sudo crictl logs $(sudo crictl ps --pod=$(sudo crictl pods --name=openshift-apiserver --quiet) --quiet)
Failed to execute operation: Unit file tuned.service does not exist.
I0514 13:25:47.529133  115916 openshift-tuned.go:176] Extracting tuned profiles
I0514 13:25:47.532185  115916 openshift-tuned.go:596] Resync period to pull node/pod labels: 57 [s]
E0514 13:25:47.532998  115916 openshift-tuned.go:686] Get https://172.30.0.1:443/api/v1/nodes/master-2: dial tcp 172.30.0.1:443: connect: connection refused
I0514 13:25:52.539859  115916 openshift-tuned.go:176] Extracting tuned profiles
I0514 13:25:52.545891  115916 openshift-tuned.go:596] Resync period to pull node/pod labels: 58 [s]
E0514 13:25:52.546284  115916 openshift-tuned.go:686] Get https://172.30.0.1:443/api/v1/nodes/master-2: dial tcp 172.30.0.1:443: connect: connection refused
I0514 13:25:57.546506  115916 openshift-tuned.go:176] Extracting tuned profiles
I0514 13:25:57.557974  115916 openshift-tuned.go:596] Resync period to pull node/pod labels: 56 [s]

# sudo crictl logs $(sudo crictl ps --pod=$(sudo crictl pods --name=keepalived --quiet) --quiet)
9: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:38 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:40 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:42 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:44 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:46 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:48 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:50 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:52 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:54 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:56 2019: Track script chk_ocp is already running, expect idle - skipping run
Tue May 14 13:22:58 2019: Track script chk_ocp is already running, expect idle - skipping run

hardys commented 5 years ago

Just to clarify, you left a delay after the power-off before testing the API access via the VIP, right?

I've tested this several times before and it worked so if the VIP doesn't fail over the something must have changed, perhaps @yboaron or @celebdor may have seen something?

gklein commented 5 years ago

I’v kept master-0 down , and tried to access the vip few minutes after the host shutdown.

russellb commented 5 years ago

I think the title is a bit misleading, as it implies that there is a problem with VIP failover. In the logs included with your original post, it appears that the VIP did fail over properly to a different master (master-2?).

Instead, the API indeed seems down. It looks like you tried to show the api server logs, but it didn't? I'd be looking at the api server on the master where the VIP is though ...

russellb commented 5 years ago

I logged in to this cluster to take a look. This is the second time I've seen a cluster in this state.

To recap:

We started with a 3 master cluster.
A master went down and is still gone. Specifically, master-0 went down. master-1 and master-2 remain.
The API VIP failed over to master-2.

Now, what I see:

kube-apiserver on master-2 runs and exits due to errors. On this host, it has been restarted 96 times
The kube-apiserver logs show that it fails connecting to etcd. See these errors which repeat over and over in the log:

W0514 20:23:17.258188       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-2.ostest.test.metalkube.org:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, etcd.kube-system.svc, etcd.kube-system.svc.cluster.local, etcd-2.ostest.test.metalkube.org, not etcd-0.ostest.test.metalkube.org". Reconnecting...
W0514 20:23:18.009310       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-1.ostest.test.metalkube.org:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, etcd.kube-system.svc, etcd.kube-system.svc.cluster.local, etcd-1.ostest.test.metalkube.org, not etcd-0.ostest.test.metalkube.org". Reconnecting...
W0514 20:23:18.344866       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-0.ostest.test.metalkube.org:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-0.ostest.test.metalkube.org on 192.168.111.2:53: no such host". Reconnecting...

russellb commented 5 years ago

The kube-apiserver logs show that it fails connecting to etcd. See these errors which repeat over and over in the log:

W0514 20:23:17.258188       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-2.ostest.test.metalkube.org:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, etcd.kube-system.svc, etcd.kube-system.svc.cluster.local, etcd-2.ostest.test.metalkube.org, not etcd-0.ostest.test.metalkube.org". Reconnecting...
W0514 20:23:18.009310       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-1.ostest.test.metalkube.org:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, etcd.kube-system.svc, etcd.kube-system.svc.cluster.local, etcd-1.ostest.test.metalkube.org, not etcd-0.ostest.test.metalkube.org". Reconnecting...
W0514 20:23:18.344866       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-0.ostest.test.metalkube.org:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-0.ostest.test.metalkube.org on 192.168.111.2:53: no such host". Reconnecting...

Note the hostname mismatch. It's failing because it's expecting the certs from etcd-1 and etcd-2 to be valid for etcd-0 for some reason.

russellb commented 5 years ago

This issue has been fixed in a newer version of OpenShift. We should not hit this problem after our next rebase.

gklein commented 5 years ago

It seems to be resolved with the latest rebase

$ openstack baremetal node list
+--------------------------------------+--------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name               | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------------+--------------------------------------+-------------+--------------------+-------------+
| 97ef4c94-231c-4b35-81c6-022047cc4306 | openshift-master-1 | None                                 | power off   | active             | False       |
| cac8b6a1-8483-4fe7-950f-57aa2a2d6a7e | openshift-master-2 | None                                 | power on    | active             | False       |
| 8be1e40c-93fa-40be-96ca-8d855e105792 | openshift-master-0 | None                                 | power on    | active             | False       |
| b6e4dc37-cfa5-4de3-89d8-1c03a4070c65 | openshift-worker-1 | b6e4dc37-cfa5-4de3-89d8-1c03a4070c65 | power on    | active             | False       |
| a79b4964-be11-497e-af33-4fe14bff5de8 | openshift-worker-2 | a79b4964-be11-497e-af33-4fe14bff5de8 | power on    | active             | False       |
| e6896e94-70cc-4d85-83e6-61db01f37da2 | openshift-worker-0 | e6896e94-70cc-4d85-83e6-61db01f37da2 | power on    | active             | False       |
+--------------------------------------+--------------------+--------------------------------------+-------------+--------------------+-------------+
$  oc get nodes
NAME       STATUS     ROLES    AGE   VERSION
master-0   Ready      master   27h   v1.13.4+c3617b99f
master-1   NotReady   master   27h   v1.13.4+c3617b99f
master-2   Ready      master   27h   v1.13.4+c3617b99f
worker-0   Ready      worker   27h   v1.13.4+c3617b99f
worker-1   Ready      worker   27h   v1.13.4+c3617b99f
worker-2   Ready      worker   27h   v1.13.4+c3617b99f

gklein commented 5 years ago

Closing based on the latest test: https://github.com/openshift-metal3/dev-scripts/issues/534#issuecomment-493919967

russellb commented 5 years ago

Great! Thanks for the follow up!

openshift-metal3 / dev-scripts

When a single master goes down the api is no longer available (virt) #534