tomassedovic commented 5 years ago

Version

$ openshift-install version
bin/openshift-install v0.10.0

Platform (aws|libvirt|openstack):

openstack (but I've seen private reports on non-openstack platform as well, not sure whether libvirt or aws)

What happened?

The bootstrap-complete event is never received by the installer and there is an event decoding event in the installer log:

DEBUG Apply complete! Resources: 69 added, 0 changed, 0 destroyed. 
<snip>
DEBUG State path: /tmp/openshift-install-661397697/terraform.tfstate 
INFO Waiting up to 30m0s for the Kubernetes API... 
<snip>
INFO API v1.11.0+5c90b23 up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
DEBUG added kube-controller-manager.157aa092ee95fbb4: ostest-bootstrap_621843b4-1a4e-11e9-af61-fa163e374b30 became leader 
DEBUG added kube-scheduler.157aa093243d2e41: ostest-bootstrap_621f7c84-1a4e-11e9-a8d3-fa163e374b30 became leader 
ERROR: logging before flag.Parse: E0117 12:59:32.766608   20134 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 73 
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 73 
DEBUG added kube-scheduler.157aa0f995a447ba: ostest-master-0_69f74621-1a4f-11e9-b903-fa163e29ca92 became leader 
DEBUG added kube-controller-manager.157aa0fb17a3d542: ostest-master-0_6ae228ba-1a4f-11e9-b88e-fa163e29ca92 became leader 
FATAL waiting for bootstrap-complete: timed out waiting for the condition 
DEBUG Stopping RetryWatcher.

What you expected to happen?

The installer switches from the bootstrap control plane to the masters and deletes the bootstrap node. All the pods that are supposed to run on the masters will be there.

How to reproduce it (as minimally and precisely as possible)?

Create ostest/install-config.yaml with the following contents:

apiVersion: v1beta1
baseDomain: shiftstack.com
clusterID:  76f91012-5591-4d90-9791-2b7fd09a41b9
machines:
- name:     master
  replicas: 3
- name:     worker
  replicas: 3
metadata:
  name: ostest
networking:
  clusterNetworks:
  - cidr:             10.128.0.0/14
    hostSubnetLength: 9
  serviceCIDR: 172.30.0.0/16
  machineCIDR: 10.0.0.0/16
  type:        OpenshiftSDN
platform:
  openstack:
    cloud:            standalone
    externalNetwork:  public
    region:           regionOne
    computeFlavor:    m1.medium
pullSecret: <snip>
sshKey: <snip>

And then install the cluster:

$ openshift-install --log-level=debug create cluster --dir ostest

Anything else we need to know?

I'm seeing the same error with a checkout that was working fine for me last week (commit e539a993e81edcf598696eefddbb19f5c7b9d23c). It's also present on the master branch.

So this is probably an issue with one of the images or operators rather than a regression in the installer itself.

That is where the error manifests though, so I'm logging it here in hopes of folks here pointing me in the right direction.

The master nodes seem to come up fine:

$ oc get nodes
NAME              STATUS    ROLES     AGE       VERSION
ostest-master-0   Ready     master    37m       v1.11.0+406fc897d8
ostest-master-1   Ready     master    37m       v1.11.0+406fc897d8
ostest-master-2   Ready     master    37m       v1.11.0+406fc897d8

Bootkube doesn't seem to show any unexpected problems:

Jan 17 11:52:14 ostest-bootstrap bootkube.sh[3331]: Starting temporary bootstrap control plane...
Jan 17 11:52:14 ostest-bootstrap bootkube.sh[3331]: Waiting for api-server...
<snip>
Jan 17 11:52:59 ostest-bootstrap bootkube.sh[3331]: Creating self-hosted assets...
Jan 17 11:52:59 ostest-bootstrap bootkube.sh[3331]: Created /assets/manifests/0000_00_cluster-version-operator_00_namespace.yaml Namespace openshift-cluster-version
Jan 17 11:52:59 ostest-bootstrap bootkube.sh[3331]: Created /assets/manifests/00_openshift-config-managed-ns.yaml Namespace openshift-config-managed
Jan 17 11:52:59 ostest-bootstrap bootkube.sh[3331]: Created /assets/manifests/00_openshift-config-ns.yaml Namespace openshift-config
<snip>
Jan 17 11:59:30 ostest-bootstrap bootkube.sh[3331]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager-ostest-master-2        Ready
Jan 17 11:59:30 ostest-bootstrap bootkube.sh[3331]: Pod Status:openshift-cluster-version/cluster-version-operator-56dcfc9b89-4sw46        Ready
Jan 17 11:59:30 ostest-bootstrap bootkube.sh[3331]: All self-hosted control plane components successfully started
Jan 17 11:59:30 ostest-bootstrap bootkube.sh[3331]: Tearing down temporary bootstrap control plane...

But it seems a bunch of the deployments that should be running are missing:

$ oc get deployments --all-namespaces
NAMESPACE                                    NAME                                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
openshift-cluster-version                    cluster-version-operator                 1         1         1            1           43m
openshift-core-operators                     openshift-service-cert-signer-operator   1         1         1            1           42m
openshift-core-operators                     origin-cluster-osin-operator             1         1         1            1           43m
openshift-core-operators                     origin-cluster-osin-operator2            1         1         1            1           43m
openshift-dns-operator                       dns-operator                             1         1         1            1           43m
openshift-kube-apiserver-operator            openshift-kube-apiserver-operator        1         1         1            1           42m
openshift-kube-controller-manager-operator   kube-controller-manager-operator         1         1         1            1           37m
openshift-kube-scheduler-operator            openshift-kube-scheduler-operator        1         1         1            1           37m
openshift-service-cert-signer                apiservice-cabundle-injector             1         1         1            1           40m
openshift-service-cert-signer                configmap-cabundle-injector              1         1         1            1           40m
openshift-service-cert-signer                service-serving-cert-signer              1         1         1            1           40m

Note the absence of machine-api-operator, clusterapi-manager-controllers and so on.

Similarly, there's usually more containers running on the master nodes:

[root@ostest-master-0 ~]# crictl ps -a
CONTAINER ID        IMAGE                                                                                                                                           CREATED             STATE               NAME                                ATTEMPT
962fdaa29fc16       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:88f0651f3faafb5b4ba14da63a650250f79e5436bbdbecaeec18fc0a4baf27b1   33 minutes ago      Running             openshift-kube-controller-manager   0
76f38678055c4       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:88f0651f3faafb5b4ba14da63a650250f79e5436bbdbecaeec18fc0a4baf27b1   33 minutes ago      Running             scheduler                           0
d0771c4eaf81e       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:843d91dc4ca6d5a255cb42499e5154630d35db318075ffc88c2c713ebe55d677   33 minutes ago      Exited              installer                           0
d38e74decd102       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:3962d0d4e2336056a6073d502076ab29ae941431114911091bb00a01f60b3237   33 minutes ago      Exited              installer                           0
a333a6b02fdb3       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:f215569b82092fb88f8c56714f5095460a65292a25c40d5491c89d22cf7e5426   36 minutes ago      Running             openshift-kube-apiserver            0
5528366c9fdce       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:ec80747d341877c2574c7b4614d6086109d99b867d8aa9c0f05235d944ed65db   36 minutes ago      Running             operator                            1
20a7ff9633158       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:ec80747d341877c2574c7b4614d6086109d99b867d8aa9c0f05235d944ed65db   36 minutes ago      Exited              installer                           0
5cfc749bd2e4b       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:66b01a5900644f907c8a68b02331cdac15d0c783674d058967c1625ea74638d2   37 minutes ago      Running             dns-node-resolver                   0
8ff454db14659       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:e4936a702d7d466a64a6a9359f35c7ad528bba7c35fe5c582a90e46f9051d8b8   37 minutes ago      Running             dns                                 0
972073e2f6db2       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:b532d9351b803b5a03cf6777f12d725b4973851957497ea3e2b37313aadd6750   37 minutes ago      Running             operator                            0
6062f3539bf9a       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:8823a59a57e68fb0e14b3f27687516e16bb31540bb4dc5253972db3c2c7940d3   37 minutes ago      Running             dns-operator                        0
9492ec68e9b35       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:960d979e8529fa8d6376c1bd277c7f329a1ccae89ecb84690fb9293c97e7a307   37 minutes ago      Running             operator                            0
30576bddcaa58       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:ec80747d341877c2574c7b4614d6086109d99b867d8aa9c0f05235d944ed65db   37 minutes ago      Exited              operator                            0
c0cf4fadfc01e       registry.svc.ci.openshift.org/openshift/origin-release@sha256:c40af55a86dc23a94b7f6e97c03c4738d711ff11b9caa63b0649dc556037a00b                  37 minutes ago      Running             cluster-version-operator            0
3d6eb55136a2f       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:960d979e8529fa8d6376c1bd277c7f329a1ccae89ecb84690fb9293c97e7a307   37 minutes ago      Running             operator                            0
061d29239c7d3       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:53592fc584c6c7d76058000839100e24e3e26de4a7fb7726ba2f59426c15301d   38 minutes ago      Running             openvswitch                         0
9e7b0d1cf3abb       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:53592fc584c6c7d76058000839100e24e3e26de4a7fb7726ba2f59426c15301d   38 minutes ago      Running             sdn                                 0
8fd3039eba7a0       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:f215569b82092fb88f8c56714f5095460a65292a25c40d5491c89d22cf7e5426   38 minutes ago      Running             sdn-controller                      0
de1fc9ca99654       registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-17-070151@sha256:37588fe438c33297b682d6382eb5d95b0ba6a25b26ecdc79595618794b1f8db4   39 minutes ago      Running             cluster-network-operator            0
c46147fe5dd0b       quay.io/coreos/etcd@sha256:cb9cee3d9d49050e7682fde0a9b26d6948a0117b1b4367b8170fcaa3960a57b8                                                     42 minutes ago      Running             etcd-member                         0
11cb2f1dc931f       quay.io/coreos/kube-client-agent@sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d                                        42 minutes ago      Exited              certs                               0
e01a5e81ddc07       registry.svc.ci.openshift.org/openshift/origin-v4.0@sha256:3d19dfefa543bb8194e18c48716c7132f9c9b982611a14a14eb5cceeaa60bcf4                     42 minutes ago      Exited              discovery                           0

No apparent errors or high restart counts though.

flaper87 commented 5 years ago

@smarterclayton @wking I think you guys were debugging this issue (or a similar one) a couple of days ago.

hexfusion commented 5 years ago

similar with aws, reproducible with 0.9.1 0.10 and master

INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.11.0+c69f926354 up                    
INFO Waiting up to 30m0s for the bootstrap-complete event... 
ERROR: logging before flag.Parse: E0117 08:32:08.405777   24389 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV:  
WARNING Failed to connect events watcher: Get https://sbatsche-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?watch=true: dial tcp 3.87.84.74:6443: connect: connection refused
[..]
FATAL waiting for bootstrap-complete: timed out waiting for the condition

master-0 $ crictl ps -a
CONTAINER ID        IMAGE                                                                                                                         CREATED             STATE               NAME                ATTEMPT
8b9bc36c0c276       quay.io/coreos/etcd@sha256:cb9cee3d9d49050e7682fde0a9b26d6948a0117b1b4367b8170fcaa3960a57b8                                   35 minutes ago      Running             etcd-member         0
35833a944e129       quay.io/coreos/kube-client-agent@sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d                      35 minutes ago      Exited              certs               0
cf8bdc1886764       registry.svc.ci.openshift.org/openshift/origin-v4.0@sha256:3d19dfefa543bb8194e18c48716c7132f9c9b982611a14a14eb5cceeaa60bcf4   35 minutes ago      Exited              discovery           0

etcd shows

2019-01-17T13:10:11.150313341+00:00 stderr F 2019-01-17 13:10:11.150277 I | embed: rejected connection from "127.0.0.1:37686" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")

Going to pull the etcd certs and take a closer look. etcd has stopped on all 3 nodes so that is a start.

hexfusion commented 5 years ago

so my server cert SAN looks like this

DNS:localhost, DNS:etcd.kube-system.svc, DNS:etcd.kube-system.svc.cluster.local, DNS:sbatsche-etcd-0.devcluster.openshift.com, IP Address:10.0.15.187, IP Address:127.0.0.1

peer like so

DNS:sbatsche-etcd-0.devcluster.openshift.com, DNS:sbatsche.devcluster.openshift.com, IP Address:10.0.15.187

but my peer does not include localhost or 127.0.0.1 if we are using the peer cert with a client this would explain the error. Is this a regression ? I will try to backtrack through this process.

EmilyM1 commented 5 years ago

AWS, 0.9.1

INFO Waiting up to 30m0s for the bootstrap-complete event... ERROR: logging before flag.Parse: E0116 17:16:21.986936 75976 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug="" WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 53

hexfusion commented 5 years ago

I was able to verify etcd is actually working correctly.

crictl exec 134f8b9a259c1 /bin/sh -c "ETCDCTL_API=3 etcdctl --endpoints="https://10.0.15.187:2379,https://10.0.30.46:2379,https://10.0.35.250:2379" --cert /etc/ssl/etcd/system:etcd-peer:sbatsche-etcd-0.devcluster.openshift.com.crt --key /etc/ssl/etcd/system:etcd-peer:sbatsche-etcd-0.devcluster.openshift.com.key --cacert /etc/ssl/etcd/ca.crt endpoint status -w table"
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
|         ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://10.0.15.187:2379 | b261054f85985ee6 |  3.3.10 |  160 kB |     false |         2 |        522 |
|  https://10.0.30.46:2379 | 1e915deba4617a88 |  3.3.10 |  164 kB |      true |         2 |        522 |
| https://10.0.35.250:2379 | 74e95148d793879b |  3.3.10 |  156 kB |     false |         2 |        522 |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+

abhinavdahiya commented 5 years ago

Note the absence of machine-api-operator, clusterapi-manager-controllers and so on.

This looks like cluster version operator is failing to progress on installing operators. oc get clusterversion version -oyaml will give you insights to the status of cluster version operator oc logs deploy/cluster-version-operator -n openshift-cluster-version to get logs from cluster version operator

smarterclayton commented 5 years ago

The most recent set of changes to the CVO ensure that we can now do:

oc wait clusterversion/version --for=condition=available

during install to wait until the payload is delivered, which should mean all operators report they are happy and healthy. I'm going to be tweaking the startup loops in the CI env for than now.

On Thu, Jan 17, 2019 at 4:34 PM Abhinav Dahiya notifications@github.com wrote:

Note the absence of machine-api-operator, clusterapi-manager-controllers and so on.

This looks like cluster version operator is failing to progress on installing operators. oc get clusterversion version -oyaml will give you insights to the status of cluster version operator oc logs deploy/cluster-version-operator -n openshift-cluster-version to get logs from cluster version operator

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/installer/issues/1092#issuecomment-455340274, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_p3ZwFjfslKGBeRNl-f7sI2ZwHkkuks5vEOx1gaJpZM4aFIPl .

tomassedovic commented 5 years ago

Thanks for the CVO pointers. I'm seeing this:

  - lastTransitionTime: 2019-01-18T10:56:21Z
    message: Could not update clusteroperator "kube-controller-manager" (config.openshift.io/v1,
      59 of 218)
    reason: UpdatePayloadFailed
    status: "True"
    type: Failing
  - lastTransitionTime: 2019-01-17T11:53:08Z
    message: 'Unable to apply 4.0.0-0.alpha-2019-01-17-070151: the update could not
      be applied'
    reason: UpdatePayloadFailed
    status: "True"
    type: Progressing
  - lastTransitionTime: 2019-01-17T11:53:08Z
    message: 'Unable to retrieve available updates: Get http://localhost:8080/graph:
      dial tcp [::1]:8080: connect: connection refused'
    reason: RemoteFailed
    status: "False"
    type: RetrievedUpdates

Checking the logs doesn't seem to work for me though. I'm getting this error:

$ oc logs deploy/cluster-version-operator -n openshift-cluster-version
Error from server: Get https://ostest-master-0:10250/containerLogs/openshift-cluster-version/cluster-version-operator-56dcfc9b89-4sw46/cluster-version-operator: remote error: tls: internal error

tomassedovic commented 5 years ago

Okay, I dug a bit deeper. Here's the chain of events:

The installer doesn't receive the bootstrap-complete event, because progress.service can't find the /opt/openshift/.openshift.done file (.bootkube.done exists). That file is supposed to be created by the openshift.service, but it's not because of:

Jan 25 08:40:12 ostest-bootstrap openshift.sh[3333]: Creating object from file: ./99_openshift-cluster-api_cluster.yaml ...
Jan 25 08:40:12 ostest-bootstrap openshift.sh[3333]: Executing kubectl create --filename ./99_openshift-cluster-api_cluster.yaml
Jan 25 08:40:13 ostest-bootstrap openshift.sh[3333]: error: unable to recognize "./99_openshift-cluster-api_cluster.yaml": no matches for kind "Cluster" in version "cluster.k8s.io/v1alpha1"
Jan 25 08:40:13 ostest-bootstrap openshift.sh[3333]: kubectl create --filename ./99_openshift-cluster-api_cluster.yaml failed. Retrying in 5 seconds...
Jan 25 08:40:18 ostest-bootstrap openshift.sh[3333]: error: unable to recognize "./99_openshift-cluster-api_cluster.yaml": no matches for kind "Cluster" in version "cluster.k8s.io/v1alpha1"
Jan 25 08:40:18 ostest-bootstrap openshift.sh[3333]: kubectl create --filename ./99_openshift-cluster-api_cluster.yaml failed. Retrying in 5 seconds...

The 99_openshift-cluster-api_cluster.yaml file contains the following:

apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
metadata:
  creationTimestamp: null
  name: ostest
  namespace: openshift-cluster-api
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 10.128.0.0/14
    serviceDomain: ""
    services:
      cidrBlocks:
      - 172.30.0.0/16
  providerSpec: {}
status: {}

That looks good to me and using kind: Cluster should be perfectly fine. So I'm guessing something on the master control plane is not quite right.

@wking any tips on how to debug this further?

wking commented 5 years ago

no matches for kind "Cluster" means the CVO likely hung before pushing the Cluster CRD (or whoever is in charge of registering Cluster failed to do it). Check CluseterVersion, ClusterOperators, and the CVO logs?

tomassedovic commented 5 years ago

Okay, so the issue seems to be coming from here:

oc logs kube-controller-manager-operator-578cf7cc4b-p2djt -n openshift-kube-controller-manager-operator
I0125 10:43:22.077378       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"b5eda8c5-207d-11e9-9991-fa163e3a36e9", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ObserveCloudProvidersFailed' No recognized cloud provider platform found in cloud config: map[string]interface {}{"openstack":map[string]interface {}{"externalNetwork":"public", "region":"regionOne", "trunkSupport":"1", "cloud":"standalone", "computeFlavor":"m1.medium"}}
I0125 10:43:22.102562       1 status_controller.go:160] clusteroperator/kube-controller-manager diff {"metadata":{"name":"kube-controller-manager","selfLink":"/apis/config.openshift.io/v1/clusteroperators/kube-controller-manager","uid":"bc982578-207d-11e9-afa2-fa163ea976c4","resourceVersion":"42510","generation":1,"creationTimestamp":"2019-01-25T08:46:41Z"},"spec":{},"status":{"conditions":[{"type":"Failing","status":"True","lastTransitionTime":"2019-01-25T08:46:42Z","reason":"ConfigObservationFailing","message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found: map[string]interface {}{\"openstack\":map[string]interface {}{\"c

A: omputeFlavor\":\"m1.medium\", \"externalNetwork\":\"public\", \"region\":\"regionOne\", \"trunkSupport\":\"1\", \"cloud\":\"standalone\"}}"},{"type":"Available","status":"True","lastTransitionTime":null,"message":"3 of 3 nodes are at revision 2"},{"type":"Progressing","status":"False","lastTransitionTime":null,"reason":"AllNodesAtLatestRevision","message":"3 of 3 nodes are at revision 2"}],"versions":null,"relatedObjects":[{"group":"kubecontrollermanager.operator.openshift.io","resource":"kubecontrollermanageroperatorconfigs","name":"cluster"},{"group":"","resource":"namespaces","name":"openshift-config"},{"group":"","resource":"namespaces","name":"openshift-config-managed"},{"group":"","resource":"namespaces","name":"openshift-kube-controller-manager"},{"group":"","resource":"namespaces","name":"openshift-kube-controller-manager-operator"}],"extension":null}}

B: loud\":\"standalone\", \"computeFlavor\":\"m1.medium\", \"externalNetwork\":\"public\", \"region\":\"regionOne\", \"trunkSupport\":\"1\"}}"},{"type":"Available","status":"True","lastTransitionTime":null,"message":"3 of 3 nodes are at revision 2"},{"type":"Progressing","status":"False","lastTransitionTime":null,"reason":"AllNodesAtLatestRevision","message":"3 of 3 nodes are at revision 2"}],"versions":null,"relatedObjects":[{"group":"kubecontrollermanager.operator.openshift.io","resource":"kubecontrollermanageroperatorconfigs","name":"cluster"},{"group":"","resource":"namespaces","name":"openshift-config"},{"group":"","resource":"namespaces","name":"openshift-config-managed"},{"group":"","resource":"namespaces","name":"openshift-kube-controller-manager"},{"group":"","resource":"namespaces","name":"openshift-kube-controller-manager-operator"}],"extension":null}}

I0125 10:43:22.115805       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"b5eda8c5-207d-11e9-9991-fa163e3a36e9", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator kube-controller-manager changed: Failing message changed from "ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found: map[string]interface {}{\"openstack\":map[string]interface {}{\"computeFlavor\":\"m1.medium\", \"externalNetwork\":\"public\", \"region\":\"regionOne\", \"trunkSupport\":\"1\", \"cloud\":\"standalone\"}}" to "ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found: map[string]interface {}{\"openstack\":map[string]interface {}{\"cloud\":\"standalone\", \"computeFlavor\":\"m1.medium\", \"externalNetwork\":\"public\", \"region\":\"regionOne\", \"trunkSupport\":\"1\"}}"

Looks like we need to add openstack there:

https://github.com/tomassedovic/cluster-kube-controller-manager-operator/commit/7097a2f32bf03a8f4b74a520d6b4f7015e2a6c86

tomassedovic commented 5 years ago

Fixed by: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/135

openshift / installer

Installer fails to switch to the production control plane #1092

Version

Platform (aws|libvirt|openstack):

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?