openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.48k stars 4.7k forks source link

Configuring OpenShift origin for cloud provider Azure leads to cluster failure #18135

Closed fretboarder closed 6 years ago

fretboarder commented 6 years ago

When configuring an OpenShift Origin 3.7 cluster for Azure, the origin-node processes do not come up anymore. The initial installation of the OpenShift cluster works perfectly but after re-configuration according to https://docs.openshift.org/latest/install_config/configuring_azure.html the cluster is broken.

Version
$ oc version
oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://xxxxxxxx:8443
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
Steps To Reproduce
  1. Install an OpenShift cluster on Azure using the regular playbooks from https://github.com/openshift/openshift-ansible from the release-3.7 branch (I am using a small cluster with 1 master, 1 infra and 2 worker nodes)
  2. Follow the steps according to https://docs.openshift.org/latest/install_config/configuring_azure.html

This means:

The problem is, that after deleting the nodes the SECOND RESTART fails consistently.

The FIRST RESTART seems to work (probably because at this time the node has not been deleted yet???) and after a quite a long time (>60s) the node seems to recover automatically. But, in this case the networking inside the cluster seems to be broken as well, e.g. access to the internal docker registry fails with:

failed to "StartContainer" for "demo-app" with ErrImagePull: "rpc error: code = 2 desc = Get https://docker-registry.default.svc:5000/v2/: dial tcp 172.30.112.172:5000: getsockopt: no route to host"

Probably that's a separate issue.

Again, it is important to note that everyhing works well until the Azure re-configuration

Current Result

After re-starting, the origin-node processes fail to come up

Expected Result

After re-starting, the origin-node processes come up and are added to the cluster and the cluster is working again.

Additional Information

Log output of the failing origin-node process

Jan 17 07:16:26 oo-node-1 systemd[1]: Starting OpenShift Node...
Jan 17 07:16:26 oo-node-1 dnsmasq[13469]: setting upstream servers from DBus
Jan 17 07:16:26 oo-node-1 dnsmasq[13469]: using nameserver 168.63.129.16#53
Jan 17 07:16:26 oo-node-1 dnsmasq[13469]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jan 17 07:16:26 oo-node-1 dnsmasq[13469]: using nameserver 127.0.0.1#53 for domain cluster.local
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.675876   30642 start_node.go:288] Reading node configuration from /etc/origin/node/node-config.yaml
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.677820   30642 server.go:190] WARNING: all flags other than --config, --write-config-to, and --cleanup-iptables are deprecated. Please begin using a config file ASAP.
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.681759   30642 node.go:146] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "oo-node-1" (IP ""), iptables sync period "30s"
Jan 17 07:16:26 oo-node-1 systemd[1]: Started Kubernetes systemd probe.
Jan 17 07:16:26 oo-node-1 systemd[1]: Starting Kubernetes systemd probe.
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.694162   30642 mount_linux.go:192] Detected OS with systemd
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.694203   30642 client.go:72] Connecting to docker on unix:///var/run/docker.sock
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.694228   30642 client.go:92] Start docker client with request timeout=2m0s
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.695713   30642 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.712358   30642 azure.go:174] azure: using client_id+client_secret to retrieve access token
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.712471   30642 node_config.go:120] Successfully initialized cloud provider: "azure" from the config file: "/etc/azure/azure.conf"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.712496   30642 start_node.go:469] Starting node oo-node-1 (v3.7.0+7ed6862)
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.712523   30642 client.go:72] Connecting to docker on unix:///var/run/docker.sock
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.712533   30642 client.go:92] Start docker client with request timeout=2m0s
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.713885   30642 node.go:109] Connecting to Docker at unix:///var/run/docker.sock
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.731836   30642 node.go:358] Starting openshift-sdn network plugin
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.731923   30642 feature_gate.go:144] feature gates: map[]
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.732050   30642 server.go:652] cloud provider determined current node name to be oo-node-1
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.735476   30642 manager.go:144] cAdvisor running in container: "/system.slice/origin-node.service"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.740496   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.747283   30642 manager.go:152] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.747440   30642 manager.go:161] unable to connect to CRI-O api service: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dial unix /var/run/crio.sock: connect: no such file or directory
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.758741   30642 fs.go:124] Filesystem partitions: map[docker--vg-docker--pool:{mountpoint: major:253 minor:1 fsType:devicemapper blockSize:1024} /dev/sda2:{mountpoint:/ major:8 minor:2 fsType:xfs blockSize:0} /dev/sda1:{mountpoint:/boot major:8 minor:1 fsType:xfs blockSize:0} /dev/sdb1:{mountpoint:/mnt/resource major:8 minor:17 fsType:ext4 blockSize:0}]
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.768628   30642 manager.go:211] Machine: {NumCores:4 CpuFrequency:2394456 MemoryCapacity:14695591936 MachineID:de88c86b1060437ab31f8f3e4adec667 SystemUUID:F00A0012-CEA4-4645-ADB0-66E0C5D27F9F BootID:1cfa41e9-bc8b-43fc-aaff-b5fcb47730b6 Filesystems:[{Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:520785920 Type:vfs Inodes:256000 HasInodes:true} {Device:/dev/sdb1 DeviceMajor:8 DeviceMinor:17 Capacity:29457616896 Type:vfs Inodes:1835008 HasInodes:true} {Device:docker--vg-docker--pool DeviceMajor:253 DeviceMinor:1 Capacity:54693724160 Type:devicemapper Inodes:0 HasInodes:false} {Device:/dev/sda2 DeviceMajor:8 DeviceMinor:2 Capacity:31671447552 Type:vfs Inodes:15472128 HasInodes:true}] DiskMap:map[253:2:{Name:dm-2 Major:253 Minor:2 Size:54693724160 Scheduler:none} 2:0:{Name:fd0 Major:2 Minor:0 Size:4096 Scheduler:deadline} 8:0:{Name:sda Major:8 Minor:0 Size:34359738368 Scheduler:deadline} 8:16:{Name:sdb Major:8 Minor:16 Size:30064771072 Scheduler:deadline} 8:32:{Name:sdc Major:8 Minor:32 Size:137438953472 Scheduler:deadline} 253:0:{Name:dm-0 Major:253 Minor:0 Size:138412032 Scheduler:none} 253:1:{Name:dm-1 Major:253 Minor:1 Size:54693724160 Scheduler:none}] NetworkDevices:[{Name:br0 MacAddress:96:ca:7c:67:8d:49 Speed:0 Mtu:1500} {Name:eth0 MacAddress:00:0d:3a:19:b6:e4 Speed:40000 Mtu:1500} {Name:ovs-system MacAddress:da:5c:98:d8:dc:35 Speed:0 Mtu:1500} {Name:tun0 MacAddress:6a:7c:d3:0e:f8:8c Speed:0 Mtu:1500} {Name:vxlan_sys_4789 MacAddress:56:b8:a5:33:80:aa Speed:0 Mtu:65470}] Topology:[{Id:0 Memory:15031918592 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]} {Id:2 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type
Jan 17 07:16:26 oo-node-1 origin-node[30642]: :Unified Level:3}]} {Id:3 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Azure InstanceType:Unknown InstanceID:F00A0012-CEA4-4645-ADB0-66E0C5D27F9F}
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.769732   30642 manager.go:217] Version: {KernelVersion:3.10.0-693.11.6.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion:1.12.6 DockerAPIVersion:1.24 CadvisorVersion: CadvisorRevision:}
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.770408   30642 server.go:546] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.772316   30642 container_manager_linux.go:246] container manager verified user specified cgroup-root exists: /
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.772351   30642 container_manager_linux.go:251] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[]}
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.772584   30642 server.go:652] cloud provider determined current node name to be oo-node-1
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.772616   30642 server.go:838] Using root directory: /var/lib/origin/openshift.local.volumes
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.772661   30642 kubelet.go:328] cloud provider determined current node name to be oo-node-1
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.772690   30642 kubelet.go:271] Watching apiserver
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.778754   30642 kubelet.go:507] Hairpin mode set to "none"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.778952   30642 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.778970   30642 plugins.go:196] Loaded network plugin "cni"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.780139   30642 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.793728   30642 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.793754   30642 plugins.go:196] Loaded network plugin "cni"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.793801   30642 docker_service.go:210] Docker cri networking managed by cni
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.800405   30642 docker_service.go:227] Setting cgroupDriver to systemd
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.801468   30642 docker_legacy.go:151] No legacy containers found, stop performing legacy cleanup.
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.801521   30642 kubelet.go:596] Starting the GRPC server for the docker CRI shim.
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.801544   30642 docker_server.go:51] Start dockershim grpc server
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.801583   30642 util_linux.go:75] Using "/var/run/dockershim.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/dockershim.sock".
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.810784   30642 remote_runtime.go:42] Connecting to runtime service /var/run/dockershim.sock
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.810815   30642 util_linux.go:75] Using "/var/run/dockershim.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/dockershim.sock".
Jan 17 07:16:26 oo-node-1 origin-node[30642]: W0117 07:16:26.810981   30642 util_linux.go:75] Using "/var/run/dockershim.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/dockershim.sock".
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812688   30642 kuberuntime_manager.go:178] Container runtime docker initialized, version: 1.12.6, apiVersion: 1.24.0
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812902   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/aws-ebs"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812921   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/empty-dir"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812934   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/gce-pd"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812946   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/git-repo"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812961   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/host-path"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.812972   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/nfs"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813016   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/secret"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813033   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/iscsi"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813050   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/glusterfs"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813063   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/rbd"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813077   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/cinder"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813120   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/quobyte"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813133   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/cephfs"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813150   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/downward-api"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813163   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/fc"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813177   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/flocker"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813190   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/azure-file"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813204   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/configmap"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813219   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/vsphere-volume"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813232   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/azure-disk"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813246   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/photon-pd"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813259   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/projected"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813272   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/portworx-volume"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813286   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/scaleio"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813300   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/local-volume"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.813314   30642 plugins.go:378] Loaded volume plugin "kubernetes.io/storageos"
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.814274   30642 server.go:869] Started kubelet v1.7.6+a08f5eeb62
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.814355   30642 server.go:132] Starting to listen on 0.0.0.0:10250
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.815462   30642 server.go:314] Adding debug handlers to kubelet server.
Jan 17 07:16:26 oo-node-1 origin-node[30642]: I0117 07:16:26.818708   30642 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Jan 17 07:16:26 oo-node-1 origin-node[30642]: E0117 07:16:26.819701   30642 kubelet.go:1191] Image garbage collection failed once. Stats initialization may not have completed yet: unable to find data for container /
Jan 17 07:16:27 oo-node-1 origin-node[30642]: W0117 07:16:27.743514   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:29 oo-node-1 origin-node[30642]: W0117 07:16:29.246441   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:31 oo-node-1 origin-node[30642]: W0117 07:16:31.499960   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:34 oo-node-1 origin-node[30642]: W0117 07:16:34.878106   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:39 oo-node-1 origin-node[30642]: W0117 07:16:39.943937   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:44 oo-node-1 Container_ImageInventory[13295]: Container image name () is improperly formed and could not be parsed in SetRepositoryImageTag
Jan 17 07:16:47 oo-node-1 origin-node[30642]: W0117 07:16:47.540816   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:51 oo-node-1 origin-node[30642]: I0117 07:16:51.537229   30642 kubelet_node_status.go:326] Adding node label from cloud provider: beta.kubernetes.io/instance-type=Standard_DS3_v2
Jan 17 07:16:51 oo-node-1 origin-node[30642]: I0117 07:16:51.539071   30642 kubelet_node_status.go:337] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=0
Jan 17 07:16:51 oo-node-1 origin-node[30642]: I0117 07:16:51.539105   30642 kubelet_node_status.go:341] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=eastus
Jan 17 07:16:58 oo-node-1 origin-node[30642]: W0117 07:16:58.934968   30642 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:16:58 oo-node-1 origin-node[30642]: F0117 07:16:58.935014   30642 network.go:45] SDN node startup failed: failed to get subnet for this host: oo-node-1, error: timed out waiting for the condition
Jan 17 07:16:58 oo-node-1 systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
Jan 17 07:16:58 oo-node-1 dnsmasq[13469]: setting upstream servers from DBus
Jan 17 07:16:58 oo-node-1 dnsmasq[13469]: using nameserver 168.63.129.16#53
Jan 17 07:16:58 oo-node-1 systemd[1]: Failed to start OpenShift Node.
Jan 17 07:16:58 oo-node-1 systemd[1]: Unit origin-node.service entered failed state.
Jan 17 07:16:58 oo-node-1 systemd[1]: origin-node.service failed.

Interestingly, as soon as the Azure configuration is removed from node-config.yaml the origin-node process starts flawlessly:

Jan 17 07:27:45 oo-node-1 systemd[1]: Starting OpenShift Node...
Jan 17 07:27:45 oo-node-1 dnsmasq[13469]: setting upstream servers from DBus
Jan 17 07:27:45 oo-node-1 dnsmasq[13469]: using nameserver 168.63.129.16#53
Jan 17 07:27:45 oo-node-1 dnsmasq[13469]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jan 17 07:27:45 oo-node-1 dnsmasq[13469]: using nameserver 127.0.0.1#53 for domain cluster.local
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.177464   31900 start_node.go:288] Reading node configuration from /etc/origin/node/node-config.yaml
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.179279   31900 server.go:190] WARNING: all flags other than --config, --write-config-to, and --cleanup-iptables are deprecated. Please begin using a config file ASAP.
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.183158   31900 node.go:146] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "oo-node-1" (IP ""), iptables sync period "30s"
Jan 17 07:27:46 oo-node-1 systemd[1]: Started Kubernetes systemd probe.
Jan 17 07:27:46 oo-node-1 systemd[1]: Starting Kubernetes systemd probe.
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.195760   31900 mount_linux.go:192] Detected OS with systemd
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.195796   31900 client.go:72] Connecting to docker on unix:///var/run/docker.sock
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.195821   31900 client.go:92] Start docker client with request timeout=2m0s
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.197237   31900 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.212500   31900 start_node.go:469] Starting node oo-node-1 (v3.7.0+7ed6862)
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.212578   31900 client.go:72] Connecting to docker on unix:///var/run/docker.sock
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.212594   31900 client.go:92] Start docker client with request timeout=2m0s
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.214114   31900 node.go:109] Connecting to Docker at unix:///var/run/docker.sock
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.230898   31900 node.go:358] Starting openshift-sdn network plugin
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.231076   31900 feature_gate.go:144] feature gates: map[]
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.234715   31900 manager.go:144] cAdvisor running in container: "/system.slice/origin-node.service"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.239933   31900 sdn_controller.go:48] Could not find an allocated subnet for node: oo-node-1, Waiting...
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.244416   31900 manager.go:152] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.244610   31900 manager.go:161] unable to connect to CRI-O api service: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dial unix /var/run/crio.sock: connect: no such file or directory
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.256662   31900 fs.go:124] Filesystem partitions: map[/dev/sdb1:{mountpoint:/mnt/resource major:8 minor:17 fsType:ext4 blockSize:0} docker--vg-docker--pool:{mountpoint: major:253 minor:1 fsType:devicemapper blockSize:1024} /dev/sda2:{mountpoint:/ major:8 minor:2 fsType:xfs blockSize:0} /dev/sda1:{mountpoint:/boot major:8 minor:1 fsType:xfs blockSize:0}]
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.266614   31900 manager.go:211] Machine: {NumCores:4 CpuFrequency:2394456 MemoryCapacity:14695591936 MachineID:de88c86b1060437ab31f8f3e4adec667 SystemUUID:F00A0012-CEA4-4645-ADB0-66E0C5D27F9F BootID:1cfa41e9-bc8b-43fc-aaff-b5fcb47730b6 Filesystems:[{Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:520785920 Type:vfs Inodes:256000 HasInodes:true} {Device:/dev/sdb1 DeviceMajor:8 DeviceMinor:17 Capacity:29457616896 Type:vfs Inodes:1835008 HasInodes:true} {Device:docker--vg-docker--pool DeviceMajor:253 DeviceMinor:1 Capacity:54693724160 Type:devicemapper Inodes:0 HasInodes:false} {Device:/dev/sda2 DeviceMajor:8 DeviceMinor:2 Capacity:31671447552 Type:vfs Inodes:15472128 HasInodes:true}] DiskMap:map[8:16:{Name:sdb Major:8 Minor:16 Size:30064771072 Scheduler:deadline} 8:32:{Name:sdc Major:8 Minor:32 Size:137438953472 Scheduler:deadline} 253:0:{Name:dm-0 Major:253 Minor:0 Size:138412032 Scheduler:none} 253:1:{Name:dm-1 Major:253 Minor:1 Size:54693724160 Scheduler:none} 253:2:{Name:dm-2 Major:253 Minor:2 Size:54693724160 Scheduler:none} 2:0:{Name:fd0 Major:2 Minor:0 Size:4096 Scheduler:deadline} 8:0:{Name:sda Major:8 Minor:0 Size:34359738368 Scheduler:deadline}] NetworkDevices:[{Name:br0 MacAddress:96:ca:7c:67:8d:49 Speed:0 Mtu:1500} {Name:eth0 MacAddress:00:0d:3a:19:b6:e4 Speed:40000 Mtu:1500} {Name:ovs-system MacAddress:da:5c:98:d8:dc:35 Speed:0 Mtu:1500} {Name:tun0 MacAddress:6a:7c:d3:0e:f8:8c Speed:0 Mtu:1500} {Name:vxlan_sys_4789 MacAddress:56:b8:a5:33:80:aa Speed:0 Mtu:65470}] Topology:[{Id:0 Memory:15031918592 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]} {Id:2 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type
Jan 17 07:27:46 oo-node-1 origin-node[31900]: :Unified Level:3}]} {Id:3 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:31457280 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Azure InstanceType:Unknown InstanceID:F00A0012-CEA4-4645-ADB0-66E0C5D27F9F}
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.268118   31900 manager.go:217] Version: {KernelVersion:3.10.0-693.11.6.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion:1.12.6 DockerAPIVersion:1.24 CadvisorVersion: CadvisorRevision:}
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.268783   31900 server.go:546] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.270827   31900 container_manager_linux.go:246] container manager verified user specified cgroup-root exists: /
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.270860   31900 container_manager_linux.go:251] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[]}
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.271071   31900 server.go:838] Using root directory: /var/lib/origin/openshift.local.volumes
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.271132   31900 kubelet.go:271] Watching apiserver
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.276505   31900 kubelet.go:507] Hairpin mode set to "none"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.276758   31900 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.276775   31900 plugins.go:196] Loaded network plugin "cni"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.277870   31900 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.291302   31900 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.291330   31900 plugins.go:196] Loaded network plugin "cni"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.291382   31900 docker_service.go:210] Docker cri networking managed by cni
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.298481   31900 docker_service.go:227] Setting cgroupDriver to systemd
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.299750   31900 docker_legacy.go:151] No legacy containers found, stop performing legacy cleanup.
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.299824   31900 kubelet.go:596] Starting the GRPC server for the docker CRI shim.
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.299847   31900 docker_server.go:51] Start dockershim grpc server
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.299865   31900 util_linux.go:75] Using "/var/run/dockershim.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/dockershim.sock".
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.306817   31900 remote_runtime.go:42] Connecting to runtime service /var/run/dockershim.sock
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.306837   31900 util_linux.go:75] Using "/var/run/dockershim.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/dockershim.sock".
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.306940   31900 util_linux.go:75] Using "/var/run/dockershim.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/dockershim.sock".
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309432   31900 kuberuntime_manager.go:178] Container runtime docker initialized, version: 1.12.6, apiVersion: 1.24.0
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309720   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/aws-ebs"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309741   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/empty-dir"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309755   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/gce-pd"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309767   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/git-repo"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309781   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/host-path"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309792   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/nfs"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309808   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/secret"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309822   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/iscsi"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309841   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/glusterfs"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309856   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/rbd"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309870   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/cinder"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309884   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/quobyte"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309895   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/cephfs"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309911   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/downward-api"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309924   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/fc"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309938   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/flocker"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309951   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/azure-file"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309965   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/configmap"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.309979   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/vsphere-volume"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310005   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/azure-disk"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310017   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/photon-pd"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310035   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/projected"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310057   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/portworx-volume"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310071   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/scaleio"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310084   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/local-volume"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.310096   31900 plugins.go:378] Loaded volume plugin "kubernetes.io/storageos"
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.312181   31900 server.go:869] Started kubelet v1.7.6+a08f5eeb62
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.312214   31900 server.go:132] Starting to listen on 0.0.0.0:10250
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.313152   31900 server.go:314] Adding debug handlers to kubelet server.
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.313538   31900 kubelet.go:1191] Image garbage collection failed once. Stats initialization may not have completed yet: unable to find data for container /
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.313813   31900 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.320795   31900 kubelet.go:1705] Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": unable to find data for container /
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.320829   31900 kubelet.go:1713] Failed to check if disk space is available on the root partition: failed to get fs info for "root": unable to find data for container /
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.320839   31900 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.320881   31900 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.320925   31900 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322012   31900 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322039   31900 status_manager.go:141] Starting to sync pod status with apiserver
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322054   31900 kubelet.go:1785] Starting kubelet main sync loop.
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322082   31900 kubelet.go:1796] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322177   31900 container_manager_linux.go:398] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.322220   31900 container_manager_linux.go:543] [ContainerManager]: Fail to get rootfs information unable to find data for container /
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322241   31900 volume_manager.go:243] The desired_state_of_world populator starts
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.322248   31900 volume_manager.go:245] Starting Kubelet Volume Manager
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.326133   31900 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.326411   31900 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.334759   31900 factory.go:336] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.336469   31900 factory.go:351] Registering Docker factory
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.336539   31900 manager.go:260] Registration of the rkt container factory failed: unable to communicate with Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.336759   31900 manager.go:271] Registration of the crio container factory failed: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dial unix /var/run/crio.sock: connect: no such file or directory
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.336777   31900 factory.go:54] Registering systemd factory
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.337059   31900 factory.go:86] Registering Raw factory
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.337225   31900 manager.go:1139] Started watching for new ooms in manager
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.338320   31900 oomparser.go:185] oomparser using systemd
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.343780   31900 manager.go:306] Starting recovery of all containers
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.422494   31900 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.431053   31900 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.431094   31900 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.431114   31900 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.431140   31900 kubelet_node_status.go:82] Attempting to register node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.436249   31900 kubelet_node_status.go:85] Successfully registered node oo-node-1
Jan 17 07:27:46 oo-node-1 origin-node[31900]: I0117 07:27:46.465878   31900 manager.go:311] Recovery completed
Jan 17 07:27:46 oo-node-1 origin-node[31900]: E0117 07:27:46.615920   31900 helpers.go:778] Could not find capacity information for resource storage.kubernetes.io/scratch
Jan 17 07:27:46 oo-node-1 origin-node[31900]: W0117 07:27:46.615991   31900 helpers.go:789] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.291540   31900 sdn_controller.go:190] [SDN setup] full SDN setup required
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31970]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-br br0 -- add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13
Jan 17 07:27:47 oo-node-1 kernel: device tun0 left promiscuous mode
Jan 17 07:27:47 oo-node-1 kernel: device vxlan_sys_4789 left promiscuous mode
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3261] device (vxlan_sys_4789): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3]
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31974]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port br0 vxlan0
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31975]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --may-exist add-port br0 vxlan0 -- set Interface vxlan0 ofport_request=1 type=vxlan "options:remote_ip=\"flow\"" "options:key=\"flow\""
Jan 17 07:27:47 oo-node-1 kernel: device vxlan_sys_4789 entered promiscuous mode
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3547] manager: (vxlan_sys_4789): new Vxlan device (/org/freedesktop/NetworkManager/Devices/32)
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3548] device (vxlan_sys_4789): enslaved to non-master-type device ovs-system; ignoring
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3564] device (vxlan_sys_4789): state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3570] device (vxlan_sys_4789): enslaved to non-master-type device ovs-system; ignoring
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3573] device (vxlan_sys_4789): state change: unavailable -> disconnected (reason 'none') [20 30 0]
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31979]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port br0 tun0
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31980]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --may-exist add-port br0 tun0 -- set Interface tun0 ofport_request=2 type=internal
Jan 17 07:27:47 oo-node-1 kernel: device tun0 entered promiscuous mode
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3835] manager: (tun0): new Generic device (/org/freedesktop/NetworkManager/Devices/33)
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.5936] device (tun0): link connected
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.598787   31900 sdn_controller.go:261] AddHostSubnetRules for oo-infra-0 (host: "oo-infra-0", ip: "10.1.0.4", subnet: "10.128.2.0/23")
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.600003   31900 node.go:418] Starting openshift-sdn pod manager
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.621273   31900 sdn_controller.go:261] AddHostSubnetRules for oo-master-0 (host: "oo-master-0", ip: "10.1.0.5", subnet: "10.131.0.0/23")
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.642711   31900 sdn_controller.go:261] AddHostSubnetRules for oo-node-0 (host: "oo-node-0", ip: "10.2.0.5", subnet: "10.129.2.0/23")
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.749841   31900 node.go:461] openshift-sdn network plugin ready
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.756662   31900 network.go:88] Using iptables Proxier.
Jan 17 07:27:47 oo-node-1 origin-node[31900]: W0117 07:27:47.760721   31900 proxier.go:472] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended
Jan 17 07:27:47 oo-node-1 origin-node[31900]: W0117 07:27:47.760750   31900 proxier.go:488] clusterCIDR not specified, unable to distinguish between internal and external traffic
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.760896   31900 network.go:119] Tearing down userspace rules.
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.826551   31900 proxier.go:187] Setting proxy IP to 10.2.0.4 and initializing iptables
Jan 17 07:27:47 oo-node-1 systemd[1]: Started OpenShift Node.
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.889614   31900 config.go:202] Starting service config controller
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.889633   31900 network.go:226] Started Kubernetes Proxy on 0.0.0.0
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.889647   31900 controller_utils.go:1025] Waiting for caches to sync for service config controller
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.889736   31900 config.go:102] Starting endpoints config controller
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.889756   31900 controller_utils.go:1025] Waiting for caches to sync for endpoints config controller
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.889781   31900 network.go:52] Starting DNS on 127.0.0.1:53
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.890408   31900 server.go:82] Monitoring dnsmasq to point cluster queries to 127.0.0.1
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.890526   31900 logs.go:41] skydns: ready for queries on cluster.local. for tcp://127.0.0.1:53 [rcache 0]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.890541   31900 logs.go:41] skydns: ready for queries on cluster.local. for udp://127.0.0.1:53 [rcache 0]
Jan 17 07:27:47 oo-node-1 dnsmasq[13469]: setting upstream servers from DBus
Jan 17 07:27:47 oo-node-1 dnsmasq[13469]: using nameserver 168.63.129.16#53
Jan 17 07:27:47 oo-node-1 dnsmasq[13469]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Jan 17 07:27:47 oo-node-1 dnsmasq[13469]: using nameserver 127.0.0.1#53 for domain cluster.local
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894326   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/docker-registry:5000-tcp to [10.128.0.18:5000]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894432   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/kubernetes:https to [10.1.0.5:8443]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894449   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/kubernetes:dns-tcp to [10.1.0.5:8053]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894462   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/kubernetes:dns to [10.1.0.5:8053]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894486   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/registry-console:registry-console to [10.129.0.19:9090]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894508   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/router:443-tcp to [10.1.0.4:443]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894521   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/router:1936-tcp to [10.1.0.4:1936]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.894533   31900 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/router:80-tcp to [10.1.0.4:80]
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.989870   31900 controller_utils.go:1032] Caches are synced for endpoints config controller
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990038   31900 proxier.go:997] Not syncing iptables until Services and Endpoints have been received from master
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990077   31900 controller_utils.go:1032] Caches are synced for service config controller
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990141   31900 proxier.go:320] Adding new service port "default/registry-console:registry-console" at 172.30.20.157:9000/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990183   31900 proxier.go:320] Adding new service port "default/router:443-tcp" at 172.30.22.142:443/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990240   31900 proxier.go:320] Adding new service port "default/router:1936-tcp" at 172.30.22.142:1936/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990255   31900 proxier.go:320] Adding new service port "default/router:80-tcp" at 172.30.22.142:80/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990271   31900 proxier.go:320] Adding new service port "dpm-dev/dpm-web-portal:80-tcp" at 172.30.202.69:80/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990283   31900 proxier.go:320] Adding new service port "default/docker-registry:5000-tcp" at 172.30.112.172:5000/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990296   31900 proxier.go:320] Adding new service port "default/kubernetes:https" at 172.30.0.1:443/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990318   31900 proxier.go:320] Adding new service port "default/kubernetes:dns" at 172.30.0.1:53/UDP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990337   31900 proxier.go:320] Adding new service port "default/kubernetes:dns-tcp" at 172.30.0.1:53/TCP
Jan 17 07:27:47 oo-node-1 origin-node[31900]: I0117 07:27:47.990446   31900 proxier.go:1013] Stale udp service default/kubernetes:dns -> 172.30.0.1
Jan 17 07:27:48 oo-node-1 origin-node[31900]: I0117 07:27:48.026037   31900 conntrack.go:36] Deleting connection tracking state for service IP 172.30.0.1
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.322352   31900 kubelet.go:1854] SyncLoop (ADD, "api"): ""
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.322431   31900 kubelet.go:1854] SyncLoop (ADD, "api"): "omsagent-8vkqm_omslogging(ea00672d-fb57-11e7-a0bd-000d3a19b767)"
Jan 17 07:27:51 oo-node-1 origin-node[31900]: E0117 07:27:51.323032   31900 pod_workers.go:186] Error syncing pod ea00672d-fb57-11e7-a0bd-000d3a19b767 ("omsagent-8vkqm_omslogging(ea00672d-fb57-11e7-a0bd-000d3a19b767)"), skipping: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.323706   31900 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume "docker-sock" (UniqueName: "kubernetes.io/host-path/ea00672d-fb57-11e7-a0bd-000d3a19b767-docker-sock") pod "omsagent-8vkqm" (UID: "ea00672d-fb57-11e7-a0bd-000d3a19b767")
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.422633   31900 reconciler.go:257] operationExecutor.MountVolume started for volume "docker-sock" (UniqueName: "kubernetes.io/host-path/ea00672d-fb57-11e7-a0bd-000d3a19b767-docker-sock") pod "omsagent-8vkqm" (UID: "ea00672d-fb57-11e7-a0bd-000d3a19b767")
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.422709   31900 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume "omsagent-token-nl6lh" (UniqueName: "kubernetes.io/secret/ea00672d-fb57-11e7-a0bd-000d3a19b767-omsagent-token-nl6lh") pod "omsagent-8vkqm" (UID: "ea00672d-fb57-11e7-a0bd-000d3a19b767")
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.422799   31900 operation_generator.go:481] MountVolume.SetUp succeeded for volume "docker-sock" (UniqueName: "kubernetes.io/host-path/ea00672d-fb57-11e7-a0bd-000d3a19b767-docker-sock") pod "omsagent-8vkqm" (UID: "ea00672d-fb57-11e7-a0bd-000d3a19b767")
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.523129   31900 reconciler.go:257] operationExecutor.MountVolume started for volume "omsagent-token-nl6lh" (UniqueName: "kubernetes.io/secret/ea00672d-fb57-11e7-a0bd-000d3a19b767-omsagent-token-nl6lh") pod "omsagent-8vkqm" (UID: "ea00672d-fb57-11e7-a0bd-000d3a19b767")
Jan 17 07:27:51 oo-node-1 systemd[1]: Started Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/ea00672d-fb57-11e7-a0bd-000d3a19b767/volumes/kubernetes.io~secret/omsagent-token-nl6lh.
Jan 17 07:27:51 oo-node-1 systemd[1]: Starting Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/ea00672d-fb57-11e7-a0bd-000d3a19b767/volumes/kubernetes.io~secret/omsagent-token-nl6lh.
Jan 17 07:27:51 oo-node-1 origin-node[31900]: I0117 07:27:51.535993   31900 operation_generator.go:481] MountVolume.SetUp succeeded for volume "omsagent-token-nl6lh" (UniqueName: "kubernetes.io/secret/ea00672d-fb57-11e7-a0bd-000d3a19b767-omsagent-token-nl6lh") pod "omsagent-8vkqm" (UID: "ea00672d-fb57-11e7-a0bd-000d3a19b767")
Jan 17 07:27:56 oo-node-1 origin-node[31900]: I0117 07:27:56.468074   31900 kubelet_node_status.go:433] Recording NodeReady event message for node oo-node-1

The only differences I could find in the two cases are:

Jan 17 07:27:47 oo-node-1 ovs-vsctl[31970]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-br br0 -- add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13
Jan 17 07:27:47 oo-node-1 kernel: device tun0 left promiscuous mode
Jan 17 07:27:47 oo-node-1 kernel: device vxlan_sys_4789 left promiscuous mode
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3261] device (vxlan_sys_4789): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3]
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31974]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port br0 vxlan0
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31975]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --may-exist add-port br0 vxlan0 -- set Interface vxlan0 ofport_request=1 type=vxlan "options:remote_ip=\"flow\"" "options:key=\"flow\""
Jan 17 07:27:47 oo-node-1 kernel: device vxlan_sys_4789 entered promiscuous mode
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3547] manager: (vxlan_sys_4789): new Vxlan device (/org/freedesktop/NetworkManager/Devices/32)
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3548] device (vxlan_sys_4789): enslaved to non-master-type device ovs-system; ignoring
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3564] device (vxlan_sys_4789): state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3570] device (vxlan_sys_4789): enslaved to non-master-type device ovs-system; ignoring
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3573] device (vxlan_sys_4789): state change: unavailable -> disconnected (reason 'none') [20 30 0]
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31979]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port br0 tun0
Jan 17 07:27:47 oo-node-1 ovs-vsctl[31980]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --may-exist add-port br0 tun0 -- set Interface tun0 ofport_request=2 type=internal
Jan 17 07:27:47 oo-node-1 kernel: device tun0 entered promiscuous mode
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.3835] manager: (tun0): new Generic device (/org/freedesktop/NetworkManager/Devices/33)
Jan 17 07:27:47 oo-node-1 NetworkManager[13241]: <info>  [1516174067.5936] device (tun0): link connected

To me it seems that there's something wrong with the network re-configuration in the cloud provider configuration case.

For completeness, here's my inventory file

[OSEv3:children]
masters
nodes
etcd
master0
new_nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
containerized=false
ansible_ssh_user=my_user
ansible_become=yes
openshift_install_examples=true
openshift_deployment_type=origin
openshift_release=v3.7
docker_udev_workaround=True
openshift_use_dnsmasq=True
os_firewall_use_firewalld=False
openshift_override_hostname_check=true
osm_use_cockpit=true
osm_cockpit_plugins=['cockpit-kubernetes']

# default subdomain to use for exposed routes
openshift_master_default_subdomain=my_private_domain

#os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
#console_port=443
# Currently cloudprovider_kind azure is not supported by the playbook
openshift_disable_check=disk_availability,memory_availability
# default selectors for router and registry services
openshift_router_selector='region=infra'
openshift_registry_selector='region=infra'

# 'native is the default'
# openshift_master_cluster_method=native

# address of the Master LB
openshift_master_cluster_hostname=my_private_hostname.my_private_domain
openshift_master_cluster_public_hostname=my_private_hostname.my_private_domain

# Enable HTPasswdPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]

# Service catalog
openshift_enable_service_catalog=true
ansible_service_broker_install=true
template_service_broker_install=true
openshift_template_service_broker_namespaces=['openshift']
ansible_service_broker_local_registry_whitelist=['^.*-apb$']
ansible_service_broker_image_prefix=ansibleplaybookbundle/origin-

# host group for masters
[masters]
oo-master-[0:0]

# host group for etcd
[etcd]
oo-master-[0:0]

[master0]
oo-master-0

# host group for nodes
[nodes]
oo-master-0 openshift_node_labels="{'region': 'master', 'zone': 'default'}" openshift_hostname=oo-master-0 openshift_schedulable=true
oo-infra-0 openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=oo-infra-0
oo-node-0 openshift_node_labels="{'region': 'primary', 'zone': 'default'}" openshift_hostname=oo-node-0
oo-node-1 openshift_node_labels="{'region': 'primary', 'zone': 'default'}" openshift_hostname=oo-node-1

# host group for adding new nodes
[new_nodes]
edsonmarquezani commented 6 years ago

It seems to be the same issue I have -> https://github.com/openshift/origin/issues/18115

sjenning commented 6 years ago

The problem is, that after deleting the nodes the SECOND RESTART fails consistently.

Are you seeing the error mentioned in https://github.com/openshift/origin/issues/18115#issue-288679927 when the node fails to start the second time?

F0115 16:03:31.681848  122755 network.go:45] SDN node startup failed: failed to get subnet for this host: openshift-tst-master-1, error: timed out waiting for the condition
fretboarder commented 6 years ago

Exactly.

Jan 17 07:16:58 oo-node-1 origin-node[30642]: F0117 07:16:58.935014   30642 network.go:45] SDN node startup failed: failed to get subnet for this host: oo-node-1, error: timed out waiting for the condition

While doing some more investigations in the meantime I came across https://github.com/Microsoft/openshift-container-platform/blob/master/scripts/deployOpenShift.sh. This is the installation script for OCP (not Origin!!!)

Line 659,660:

runuser -l $SUDOUSER -c "ansible-playbook ~/reboot-master.yml"
runuser -l $SUDOUSER -c "ansible-playbook ~/reboot-nodes.yml"

In fact, rebooting the nodes seems to be a workaround for the moment.

sjenning commented 6 years ago

Ok, I'm going to close this issue then as a dup of #18115.