TristanCacqueray commented 5 years ago

Version

$ openshift-install version
openshift-install unreleased-master-263-g3804a863ec42a1d199e5c143075f2c03bc263100

Platform (aws|libvirt|openstack):

libvirt

What happened?

$ env TF_VAR_libvirt_master_memory=8192 TF_VAR_libvirt_master_vcpu=4  ./bin/openshift-install create cluster
INFO Fetching OS image: redhat-coreos-maipo-47.313-qemu.qcow2.gz 
INFO Consuming "Install Config" from target directory 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.12.4+897685a up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 3972 
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 3972 
WARNING Failed to connect events watcher: Get https://test-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=3972&watch=true: dial tcp 192.168.126.10:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://test-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=3972&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://test-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=3972&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://test-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=3972&watch=true: net/http: TLS handshake timeout 
WARNING Failed to connect events watcher: Get https://test-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=3972&watch=true: dial tcp 192.168.126.11:6443: i/o timeout 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 30m0s for the cluster to initialize... 
FATAL failed to initialize the cluster: timed out waiting for the condition

In .openshift_install.log:

time="2019-02-12T06:23:31Z" level=debug msg="Destroy complete! Resources: 3 destroyed."
time="2019-02-12T06:23:31Z" level=info msg="Waiting up to 30m0s for the cluster to initialize..."
time="2019-02-12T06:23:31Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:23:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:23:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:23:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:23:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:23:47Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:24:56Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:25:47Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator machine-config has not yet reported success"
time="2019-02-12T06:26:47Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator machine-config has not yet reported success"
time="2019-02-12T06:29:23Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator machine-config has not yet reported success"
time="2019-02-12T06:30:32Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator kube-controller-manager has not yet reported success"
time="2019-02-12T06:32:17Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:32:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:32:34Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:33:02Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:33:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:37:02Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator openshift-apiserver has not yet reported success"
time="2019-02-12T06:39:20Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator openshift-apiserver has not yet reported success"
time="2019-02-12T06:40:32Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator kube-apiserver has not yet reported success"
time="2019-02-12T06:44:32Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:44:48Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:44:48Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:45:03Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:45:57Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:47:02Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator kube-controller-manager has not yet reported success"
time="2019-02-12T06:51:18Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:52:28Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-02-12T06:53:31Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

What you expected to happen?

The cluster to be deployed.

How to reproduce it (as minimally and precisely as possible)?

I used this playbook to setup the hypervisor on a fedora-29 instance:

- hosts: hypervisor
  gather_facts: no
  environment:
    GOPATH: /home/fedora/go
  vars:
    install_dir: /home/fedora/go/src/github.com/openshift/installer
  tasks:
    - name: install package
      become: yes
      dnf:
        name:
          - libvirt
          - libvirt-devel
          - libvirt-daemon-kvm
          - qemu-kvm
          - make
          - git
          - golang

    - name: create keypair
      command: ssh-keygen -N '' -f /home/fedora/.ssh/id_rsa
      args:
        create: /home/fedora/.ssh/id_rsa

    - name: Clone installer
      git:
        repo: https://github.com/openshift/installer.git
        dest: "{{ install_dir }}"

    - name: setup libvirt rule
      become: yes
      copy:
        content: |
          polkit.addRule(function(action, subject) {
            if (action.id == "org.libvirt.unix.manage" && subject.local && subject.active && subject.isInGroup("wheel")) {
              return polkit.Result.YES;
            }
          });
        dest: /etc/polkit-1/rules.d/80-libvirt.rules

    - name: enable ip forward
      become: yes
      copy:
        content: net.ipv4.ip_forward = 1
        dest: /etc/sysctl.d/99-ipforward.conf

    - name: apply ip forward
      become: yes
      command: sysctl -p /etc/sysctl.d/99-ipforward.conf

    - name: configure libvirt
      become: yes
      lineinfile:
        path: /etc/libvirt/libvirtd.conf
        line: "{{ item }}"
      with_items:
        - "listen_tls = 0"
        - "listen_tcp = 1"
        - auth_tcp="none"
        - tcp_port="16509"

    - name: enable tcp listen
      become: yes
      lineinfile:
        path: /etc/sysconfig/libvirtd
        line: LIBVIRTD_ARGS="--listen"

    - name: configure virsh pool
      become: yes
      failed_when: false
      shell: |
        virsh pool-define /dev/stdin <<EOF
          <pool type='dir'>
            <name>default</name>
            <target>
              <path>/var/lib/libvirt/images</path>
            </target>
          </pool>
        EOF
        virsh pool-start default
        virsh pool-autostart default

    - name: configure nm
      become: yes
      ini_file:
        path: /etc/NetworkManager/NetworkManager.conf
        section: main
        option: dns
        value: dnsmasq

    - name: configure dnsmasqk
      become: yes
      copy:
        content: server=/tt.testing/192.168.126.1
        dest: /etc/NetworkManager/dnsmasq.d/openshift.conf

    - name: reload nm service
      become: yes
      service:
        name: NetworkManager
        state: reloaded

    - name: restart libvirtd
      become: yes
      service:
        name: libvirtd
        state: restarted

    - name: build installer
      command: hack/build.sh
      environment:
        TAGS: libvirt
      args:
        chdir: "{{ install_dir }}"
        create: "{{ install_dir }}/bin/openshift-install"

Then run the install command as described in: https://github.com/openshift/installer/pull/1217

Anything else we need to know?

It seems like those pods failed to start because of "failed to tryAcquireOrRenew context deadline exceeded" resulting in "leaderelection.go:65 leaderelection lost":

kube-apiserver-operator
kube-controller-manager-operator
openshift-kube-scheduler
sdn-controller

$ oc get pods --all-namespaces 
NAMESPACE                                    NAME                                                      READY     STATUS             RESTARTS   AGE
kube-system                                  etcd-member-test-master-0                                 1/1       Running            0          63m
openshift-apiserver-operator                 openshift-apiserver-operator-69bc79b94c-bdh8s             1/1       Running            6          42m
openshift-apiserver                          apiserver-tvmst                                           1/1       Running            0          40m
openshift-cluster-api                        cluster-autoscaler-operator-5bb7cdb9c5-hx9kc              1/1       Running            0          43m
openshift-cluster-machine-approver           machine-approver-69544c44db-rv2kz                         1/1       Running            0          62m
openshift-cluster-version                    cluster-version-operator-545fb8dbb8-bvvgg                 1/1       Running            0          64m
openshift-controller-manager-operator        openshift-controller-manager-operator-665b8d569f-bd6q2    1/1       Running            3          15m
openshift-controller-manager                 controller-manager-wcwnw                                  1/1       Running            1          9m1s
openshift-core-operators                     openshift-service-cert-signer-operator-8695765466-9ggvt   1/1       Running            4          62m
openshift-dns-operator                       dns-operator-858ddb9b6b-w7jrg                             1/1       Running            0          62m
openshift-dns                                dns-default-6fhkx                                         2/2       Running            0          62m
openshift-kube-apiserver-operator            kube-apiserver-operator-f469cf568-4gstv                   0/1       CrashLoopBackOff   12         62m
openshift-kube-apiserver                     installer-10-test-master-0                                0/1       Completed          0          30m
openshift-kube-apiserver                     installer-11-test-master-0                                0/1       Completed          0          24m
openshift-kube-apiserver                     installer-12-test-master-0                                0/1       Completed          0          17m
openshift-kube-apiserver                     installer-13-test-master-0                                0/1       Completed          0          11m
openshift-kube-apiserver                     installer-14-test-master-0                                0/1       Completed          0          4m9s
openshift-kube-apiserver                     installer-3-test-master-0                                 0/1       OOMKilled          0          57m
openshift-kube-apiserver                     installer-9-test-master-0                                 0/1       Completed          0          37m
openshift-kube-apiserver                     openshift-kube-apiserver-test-master-0                    1/1       Running            0          2m41s
openshift-kube-apiserver                     revision-pruner-10-test-master-0                          0/1       Completed          0          30m
openshift-kube-apiserver                     revision-pruner-11-test-master-0                          0/1       Completed          0          24m
openshift-kube-apiserver                     revision-pruner-12-test-master-0                          0/1       Completed          0          17m
openshift-kube-apiserver                     revision-pruner-13-test-master-0                          0/1       Completed          0          11m
openshift-kube-apiserver                     revision-pruner-14-test-master-0                          0/1       Completed          0          4m9s
openshift-kube-apiserver                     revision-pruner-3-test-master-0                           0/1       OOMKilled          0          57m
openshift-kube-apiserver                     revision-pruner-9-test-master-0                           0/1       OOMKilled          0          37m
openshift-kube-controller-manager-operator   kube-controller-manager-operator-7b9467f598-d25pr         0/1       CrashLoopBackOff   12         58m
openshift-kube-controller-manager            installer-1-test-master-0                                 0/1       Completed          0          57m
openshift-kube-controller-manager            installer-10-test-master-0                                0/1       Completed          0          24m
openshift-kube-controller-manager            installer-11-test-master-0                                0/1       Completed          0          11m
openshift-kube-controller-manager            installer-4-test-master-0                                 0/1       Completed          0          51m
openshift-kube-controller-manager            installer-7-test-master-0                                 0/1       Completed          0          43m
openshift-kube-controller-manager            installer-8-test-master-0                                 0/1       Completed          0          37m
openshift-kube-controller-manager            installer-9-test-master-0                                 0/1       Completed          0          31m
openshift-kube-controller-manager            openshift-kube-controller-manager-test-master-0           1/1       Running            2          10m
openshift-kube-controller-manager            revision-pruner-10-test-master-0                          0/1       Completed          0          24m
openshift-kube-controller-manager            revision-pruner-11-test-master-0                          0/1       OOMKilled          0          11m
openshift-kube-controller-manager            revision-pruner-4-test-master-0                           0/1       Completed          0          51m
openshift-kube-controller-manager            revision-pruner-7-test-master-0                           0/1       Completed          0          43m
openshift-kube-controller-manager            revision-pruner-8-test-master-0                           0/1       Completed          0          37m
openshift-kube-controller-manager            revision-pruner-9-test-master-0                           0/1       Completed          0          31m
openshift-kube-scheduler-operator            openshift-kube-scheduler-operator-79949fd7b6-hh44q        1/1       Running            5          60m
openshift-kube-scheduler                     installer-1-test-master-0                                 0/1       Completed          0          59m
openshift-kube-scheduler                     installer-2-test-master-0                                 0/1       Completed          0          51m
openshift-kube-scheduler                     installer-3-test-master-0                                 0/1       Completed          0          50m
openshift-kube-scheduler                     openshift-kube-scheduler-test-master-0                    0/1       CrashLoopBackOff   8          49m
openshift-kube-scheduler                     revision-pruner-0-test-master-0                           0/1       Completed          0          51m
openshift-kube-scheduler                     revision-pruner-2-test-master-0                           0/1       Completed          0          51m
openshift-kube-scheduler                     revision-pruner-3-test-master-0                           0/1       Completed          0          50m
openshift-machine-api                        clusterapi-manager-controllers-7cfcd6575f-kchps           4/4       Running            5          52m
openshift-machine-api                        machine-api-operator-5bc6869c76-t6wpf                     1/1       Running            0          55m
openshift-machine-config-operator            machine-config-controller-8b8654dcb-rcpqt                 1/1       Running            0          55m
openshift-machine-config-operator            machine-config-daemon-bl25m                               1/1       Running            0          48m
openshift-machine-config-operator            machine-config-operator-7d8f7b97b7-p4k9x                  1/1       Running            2          55m
openshift-machine-config-operator            machine-config-server-vwspz                               1/1       Running            0          48m
openshift-multus                             multus-j7cqx                                              1/1       Running            0          64m
openshift-network-operator                   network-operator-75c6bd66b8-nh6x2                         1/1       Running            0          64m
openshift-operator-lifecycle-manager         catalog-operator-66486d79ff-2x7b5                         1/1       Running            0          43m
openshift-operator-lifecycle-manager         olm-operator-56599bd555-zpclk                             1/1       Running            0          43m
openshift-operator-lifecycle-manager         olm-operators-z7sj9                                       0/1       Pending            0          42m
openshift-sdn                                ovs-d6hwr                                                 1/1       Running            0          63m
openshift-sdn                                sdn-controller-nfzgj                                      0/1       CrashLoopBackOff   10         63m
openshift-sdn                                sdn-j8jn5                                                 1/1       Running            1          63m
openshift-service-ca-operator                openshift-service-ca-operator-59b5f6fc76-zlzr2            1/1       Running            3          62m
openshift-service-cert-signer                apiservice-cabundle-injector-7c5fd97777-hfwq9             1/1       Running            5          61m
openshift-service-cert-signer                configmap-cabundle-injector-5f8c7fdf6-8v826               1/1       Running            5          61m
openshift-service-cert-signer                service-serving-cert-signer-7c784b7879-4cqxb              1/1       Running            4          61m

$ oc describe -n openshift-kube-apiserver-operator pod/kube-apiserver-operator-f469cf568-4gstv
Name:               kube-apiserver-operator-f469cf568-4gstv
Namespace:          openshift-kube-apiserver-operator
Priority:           0
PriorityClassName:  <none>
Node:               test-master-0/192.168.126.11
Start Time:         Tue, 12 Feb 2019 06:13:49 +0000
Labels:             app=kube-apiserver-operator
                    pod-template-hash=f469cf568
Annotations:        k8s.v1.cni.cncf.io/networks-status=[{
    "name": "openshift-sdn",
    "ips": [
        "10.128.0.6"
    ],
    "default": true,
    "dns": {}
}]
Status:         Running
IP:             10.128.0.6
Controlled By:  ReplicaSet/kube-apiserver-operator-f469cf568
Containers:
  operator:
    Container ID:  cri-o://097d4068eb9c0f0fac65ecc571069ba39e9bb726a7cfb075fe704f7e1a74c82b
    Image:         registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:7bc0c7c99475fac8f1d111f0f8fcf63386ec69cba7181372ce576f57bb011b8d
    Image ID:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:7bc0c7c99475fac8f1d111f0f8fcf63386ec69cba7181372ce576f57bb011b8d
    Port:          8443/TCP
    Host Port:     0/TCP
    Command:
      cluster-kube-apiserver-operator
      operator
    Args:
      --config=/var/run/configmaps/config/config.yaml
      -v=2
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:    Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' '5de5f3e1-2e95-11e9-8b5a-0a580a800006 stopped leading'
I0212 07:12:25.609147       1 leaderelection.go:249] failed to renew lease openshift-kube-apiserver-operator/openshift-cluster-kube-apiserver-operator-lock: failed to tryAcquireOrRenew context deadline exceeded
F0212 07:12:25.609268       1 leaderelection.go:65] leaderelection lost
I0212 07:12:25.609838       1 resourcesync_controller.go:213] Shutting down ResourceSyncController
I0212 07:12:25.609851       1 client_cert_rotation_controller.go:181] Shutting down CertRotationController - ""
I0212 07:12:25.609861       1 client_cert_rotation_controller.go:181] Shutting down CertRotationController - ""
I0212 07:12:25.609868       1 client_cert_rotation_controller.go:181] Shutting down CertRotationController - ""
I0212 07:12:25.609879       1 client_cert_rotation_controller.go:181] Shutting down CertRotationController - ""
I0212 07:12:25.609887       1 backing_resource_controller.go:158] Shutting down BackingResourceController
I0212 07:12:25.609892       1 config_observer_controller.go:169] Shutting down ConfigObserver
I0212 07:12:25.609898       1 monitoring_resource_controller.go:166] Shutting down MonitoringResourceController
I0212 07:12:25.609904       1 node_controller.go:133] Shutting down NodeController
I0212 07:12:25.610231       1 revision_controller.go:281] Shutting down RevisionController
I0212 07:12:25.610252       1 prune_controller.go:282] Shutting down PruneController
I0212 07:12:25.610258       1 installer_controller.go:710] Shutting down InstallerController
I0212 07:12:25.610265       1 targetconfigcontroller.go:288] Shutting down TargetConfigController
I0212 07:12:25.610273       1 status_controller.go:236] Shutting down StatusSyncer-kube-apiserver
F0212 07:12:25.649762       1 builder.go:219] stopped

      Exit Code:    255
      Started:      Tue, 12 Feb 2019 07:11:02 +0000
      Finished:     Tue, 12 Feb 2019 07:12:25 +0000
    Ready:          False
    Restart Count:  12
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      IMAGE:           registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:2021c5db59906aaffcbfb2083f071dd08224fefa2c543e30ded3e5598e552700
      OPERATOR_IMAGE:  registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:7bc0c7c99475fac8f1d111f0f8fcf63386ec69cba7181372ce576f57bb011b8d
      POD_NAME:        kube-apiserver-operator-f469cf568-4gstv (v1:metadata.name)
    Mounts:
      /var/run/configmaps/config from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-apiserver-operator-token-gktnz (ro)
      /var/run/secrets/serving-cert from serving-cert (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  serving-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-apiserver-operator-serving-cert
    Optional:    true
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-apiserver-operator-config
    Optional:  false
  kube-apiserver-operator-token-gktnz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-apiserver-operator-token-gktnz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason   Age                 From                    Message
  ----     ------   ----                ----                    -------
  Normal   Pulling  51m (x5 over 1h)    kubelet, test-master-0  pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:7bc0c7c99475fac8f1d111f0f8fcf63386ec69cba7181372ce576f57bb011b8d"
  Normal   Pulled   51m (x5 over 1h)    kubelet, test-master-0  Successfully pulled image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:7bc0c7c99475fac8f1d111f0f8fcf63386ec69cba7181372ce576f57bb011b8d"
  Normal   Created  51m (x5 over 1h)    kubelet, test-master-0  Created container
  Normal   Started  51m (x5 over 1h)    kubelet, test-master-0  Started container
  Warning  BackOff  2m (x174 over 53m)  kubelet, test-master-0  Back-off restarting failed container

$ oc describe -n openshift-kube-controller-manager-operator pod/kube-controller-manager-operator-7b9467f598-d25pr
Name:               kube-controller-manager-operator-7b9467f598-d25pr
Namespace:          openshift-kube-controller-manager-operator
Priority:           0
PriorityClassName:  <none>
Node:               test-master-0/192.168.126.11
Start Time:         Tue, 12 Feb 2019 06:17:41 +0000
Labels:             app=kube-controller-manager-operator
                    pod-template-hash=7b9467f598
Annotations:        k8s.v1.cni.cncf.io/networks-status=[{
    "name": "openshift-sdn",
    "ips": [
        "10.128.0.15"
    ],
    "default": true,
    "dns": {}
}]
Status:         Running
IP:             10.128.0.15
Controlled By:  ReplicaSet/kube-controller-manager-operator-7b9467f598
Containers:
  operator:
    Container ID:  cri-o://f1f1e78501d29dc2f4ccbd194b3fec064a1f685a19a7e3f9289076d5d2cf62b2
    Image:         registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:bc723fd0e021e804abad72739283f16029ac6fda5d83d9788c36b439e0863f06
    Image ID:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:bc723fd0e021e804abad72739283f16029ac6fda5d83d9788c36b439e0863f06
    Port:          8443/TCP
    Host Port:     0/TCP
    Command:
      cluster-kube-controller-manager-operator
      operator
    Args:
      --config=/var/run/configmaps/config/config.yaml
      -v=4
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   :12:25.379734       1 reflector.go:169] Listing and watching *v1.Role from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.380663       1 reflector.go:169] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.557144       1 request.go:530] Throttling request took 1.550934435s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/configmaps?limit=500&resourceVersion=0
I0212 07:12:25.564209       1 reflector.go:169] Listing and watching *v1.ConfigMap from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.753194       1 request.go:530] Throttling request took 1.607672212s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/services?limit=500&resourceVersion=0
I0212 07:12:25.779228       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:131
E0212 07:12:25.854821       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' '5819f037-2e95-11e9-a5d0-0a580a80000f stopped leading'
I0212 07:12:25.861651       1 leaderelection.go:249] failed to renew lease openshift-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator-lock: failed to tryAcquireOrRenew context deadline exceeded
F0212 07:12:25.862818       1 leaderelection.go:65] leaderelection lost

      Exit Code:    255
      Started:      Tue, 12 Feb 2019 07:10:52 +0000
      Finished:     Tue, 12 Feb 2019 07:12:26 +0000
    Ready:          False
    Restart Count:  12
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      IMAGE:           registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:240abd4e2b1f53baefca54ce1cb772d5e7f887253fdc71618f49b60b70c6a3fc
      OPERATOR_IMAGE:  registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:bc723fd0e021e804abad72739283f16029ac6fda5d83d9788c36b439e0863f06
      POD_NAME:        kube-controller-manager-operator-7b9467f598-d25pr (v1:metadata.name)
    Mounts:
      /var/run/configmaps/config from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-controller-manager-operator-token-cg6qp (ro)
      /var/run/secrets/serving-cert from serving-cert (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  serving-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-controller-manager-operator-serving-cert
    Optional:    true
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-controller-manager-operator-config
    Optional:  false
  kube-controller-manager-operator-token-cg6qp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-controller-manager-operator-token-cg6qp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason     Age                 From                    Message
  ----     ------     ----                ----                    -------
  Normal   Scheduled  59m                 default-scheduler       Successfully assigned openshift-kube-controller-manager-operator/kube-controller-manager-operator-7b9467f598-d25pr to test-master-0
  Normal   Pulled     52m (x5 over 59m)   kubelet, test-master-0  Successfully pulled image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:bc723fd0e021e804abad72739283f16029ac6fda5d83d9788c36b439e0863f06"
  Normal   Created    52m (x5 over 59m)   kubelet, test-master-0  Created container
  Normal   Started    52m (x5 over 59m)   kubelet, test-master-0  Started container
  Normal   Pulling    39m (x8 over 59m)   kubelet, test-master-0  pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:bc723fd0e021e804abad72739283f16029ac6fda5d83d9788c36b439e0863f06"
  Warning  BackOff    5m (x168 over 54m)  kubelet, test-master-0  Back-off restarting failed container

$ oc describe -n openshift-kube-scheduler pod/openshift-kube-scheduler-test-master-0
Name:               openshift-kube-scheduler-test-master-0
Namespace:          openshift-kube-scheduler
Priority:           2000001000
PriorityClassName:  system-node-critical
Node:               test-master-0/192.168.126.11
Start Time:         Tue, 12 Feb 2019 06:16:34 +0000
Labels:             app=openshift-kube-scheduler
                    revision=3
                    scheduler=true
Annotations:        kubernetes.io/config.hash=54e7c4c16a8ebf4c7d5ee5c5329ef280
                    kubernetes.io/config.mirror=54e7c4c16a8ebf4c7d5ee5c5329ef280
                    kubernetes.io/config.seen=2019-02-12T06:26:08.778867355Z
                    kubernetes.io/config.source=file
Status:             Running
IP:                 192.168.126.11
Containers:
  scheduler:
    Container ID:  cri-o://983cc7adcc5441d2c3c0e93351fe72219db3bdf042ed81c8f6208c2811d5c895
    Image:         registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:240abd4e2b1f53baefca54ce1cb772d5e7f887253fdc71618f49b60b70c6a3fc
    Image ID:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:240abd4e2b1f53baefca54ce1cb772d5e7f887253fdc71618f49b60b70c6a3fc
    Port:          <none>
    Host Port:     <none>
    Command:
      hyperkube
      kube-scheduler
    Args:
      --config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml
      -v=4
    State:       Running
      Started:   Tue, 12 Feb 2019 07:17:42 +0000
    Last State:  Terminated
      Reason:    Error
      Message:    *v1.Node from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.141264       1 reflector.go:169] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.172936       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.182773       1 reflector.go:169] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.257212       1 reflector.go:169] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.297850       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.655702       1 reflector.go:169] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:131
I0212 07:12:25.698205       1 reflector.go:169] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:131
E0212 07:12:27.997807       1 event.go:259] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'test-master-0_5efbbc26-2e95-11e9-938b-664f163f5f0f stopped leading'
I0212 07:12:28.009191       1 leaderelection.go:249] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0212 07:12:28.009778       1 server.go:207] lost master
lost lease

      Exit Code:    1
      Started:      Tue, 12 Feb 2019 07:11:05 +0000
      Finished:     Tue, 12 Feb 2019 07:12:28 +0000
    Ready:          True
    Restart Count:  9
    Requests:
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  resource-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/static-pod-resources/kube-scheduler-pod-3
    HostPathType:  
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       
                   :NoExecute
                   node.kubernetes.io/memory-pressure:NoSchedule
Events:            <none>

$ oc describe -n openshift-sdn pod/sdn-controller-nfzgj
Name:               sdn-controller-nfzgj
Namespace:          openshift-sdn
Priority:           0
PriorityClassName:  <none>
Node:               test-master-0/192.168.126.11
Start Time:         Tue, 12 Feb 2019 06:11:58 +0000
Labels:             app=sdn-controller
                    controller-revision-hash=5bc45b8556
                    pod-template-generation=1
Annotations:        <none>
Status:             Running
IP:                 192.168.126.11
Controlled By:      DaemonSet/sdn-controller
Containers:
  sdn-controller:
    Container ID:  cri-o://b4693cd0899bcbccf0cb0c2ecdd0fa0a41f5da867231b4a7c450a2572bf14b1b
    Image:         registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:2021c5db59906aaffcbfb2083f071dd08224fefa2c543e30ded3e5598e552700
    Image ID:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:2021c5db59906aaffcbfb2083f071dd08224fefa2c543e30ded3e5598e552700
    Port:          <none>
    Host Port:     <none>
    Command:
      hypershift
      openshift-network-controller
    Args:
      --config=/config/controller-config.yaml
    State:       Running
      Started:   Tue, 12 Feb 2019 07:17:29 +0000
    Last State:  Terminated
      Reason:    Error
      Message:   espace: Get https://test-api.tt.testing:6443/apis/network.openshift.io/v1/netnamespaces?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0212 07:12:24.103774       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Node: Get https://test-api.tt.testing:6443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0212 07:12:24.155847       1 leaderelection.go:270] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://test-api.tt.testing:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 192.168.126.11:6443: connect: connection refused
E0212 07:12:24.160740       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Namespace: Get https://test-api.tt.testing:6443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0212 07:12:25.601195       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'test-master-0 stopped leading'
I0212 07:12:25.605238       1 leaderelection.go:249] failed to renew lease openshift-sdn/openshift-network-controller: failed to tryAcquireOrRenew context deadline exceeded
F0212 07:12:25.605627       1 network_controller.go:82] leaderelection lost

      Exit Code:    255
      Started:      Tue, 12 Feb 2019 07:10:57 +0000
      Finished:     Tue, 12 Feb 2019 07:12:25 +0000
    Ready:          True
    Restart Count:  11
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      KUBERNETES_SERVICE_PORT:  6443
      KUBERNETES_SERVICE_HOST:  test-api.tt.testing
    Mounts:
      /config from config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from sdn-controller-token-zs6kd (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sdn-controller-config
    Optional:  false
  sdn-controller-token-zs6kd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  sdn-controller-token-zs6kd
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason   Age                 From                    Message
  ----     ------   ----                ----                    -------
  Normal   Created  50m (x5 over 1h)    kubelet, test-master-0  Created container
  Normal   Started  50m (x5 over 1h)    kubelet, test-master-0  Started container
  Normal   Pulled   22m (x8 over 57m)   kubelet, test-master-0  Container image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:2021c5db59906aaffcbfb2083f071dd08224fefa2c543e30ded3e5598e552700" already present on machine
  Warning  BackOff  2m (x141 over 55m)  kubelet, test-master-0  Back-off restarting failed container

Also, the openshift-apiserver-operator failed with: CNI request failed with status 400: 'failed to find netid for namespace: openshift-apiserver-operator, netnamespaces.network.openshift.io "openshift-apiserver-operator" not found

$ oc describe -n openshift-apiserver-operator pod/openshift-apiserver-operator-69bc79b94c-bdh8s
Name:               openshift-apiserver-operator-69bc79b94c-bdh8s
Namespace:          openshift-apiserver-operator
Priority:           0
PriorityClassName:  <none>
Node:               test-master-0/192.168.126.11
Start Time:         Tue, 12 Feb 2019 06:33:45 +0000
Labels:             app=openshift-apiserver-operator
                    pod-template-hash=69bc79b94c
Annotations:        k8s.v1.cni.cncf.io/networks-status=[{
    "name": "openshift-sdn",
    "ips": [
        "10.128.0.53"
    ],
    "default": true,
    "dns": {}
}]
Status:         Running
IP:             10.128.0.53
Controlled By:  ReplicaSet/openshift-apiserver-operator-69bc79b94c
Containers:
  operator:
    Container ID:  cri-o://30c70fa916c80aea8fd8fcb2f6e1d789f6e9c65abc0c249c236dfc56fc076d19
    Image:         registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:5322f24567b8ee626fb19b9ec2a2eda3595bff46820fcca14e75de0fdddc6805
    Image ID:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:5322f24567b8ee626fb19b9ec2a2eda3595bff46820fcca14e75de0fdddc6805
    Port:          8443/TCP
    Host Port:     0/TCP
    Command:
      cluster-openshift-apiserver-operator
      operator
    Args:
      --config=/var/run/configmaps/config/config.yaml
      -v=2
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Tue, 12 Feb 2019 07:15:21 +0000
      Finished:     Tue, 12 Feb 2019 07:19:26 +0000
    Ready:          False
    Restart Count:  6
    Requests:
      memory:  50Mi
    Environment:
      IMAGE:  registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:2021c5db59906aaffcbfb2083f071dd08224fefa2c543e30ded3e5598e552700
    Mounts:
      /var/run/configmaps/config from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from openshift-apiserver-operator-token-4pwlm (ro)
      /var/run/secrets/serving-cert from serving-cert (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  serving-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  openshift-apiserver-operator-serving-cert
    Optional:    true
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      openshift-apiserver-operator-config
    Optional:  false
  openshift-apiserver-operator-token-4pwlm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  openshift-apiserver-operator-token-4pwlm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason                  Age   From                    Message
  ----     ------                  ----  ----                    -------
  Normal   Scheduled               49m   default-scheduler       Successfully assigned openshift-apiserver-operator/openshift-apiserver-operator-69bc79b94c-bdh8s to test-master-0
  Warning  FailedCreatePodSandBox  49m   kubelet, test-master-0  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-apiserver-operator-69bc79b94c-bdh8s_openshift-apiserver-operator_25ce6603-2e90-11e9-ad23-664f163f5f0f_0(1aab2510a9bf56467c27d2a6265e1f195447239e2d2c4118496e32c5edeae936): Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "openshift-sdn": CNI request failed with status 400: 'failed to find netid for namespace: openshift-apiserver-operator, netnamespaces.network.openshift.io "openshift-apiserver-operator" not found
'
  Warning  FailedCreatePodSandBox  49m  kubelet, test-master-0  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-apiserver-operator-69bc79b94c-bdh8s_openshift-apiserver-operator_25ce6603-2e90-11e9-ad23-664f163f5f0f_0(c73d215d8e63eb83b2685f2652edac18989b8de436609a321f2cef6d20c778ee): Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "openshift-sdn": CNI request failed with status 400: 'failed to find netid for namespace: openshift-apiserver-operator, netnamespaces.network.openshift.io "openshift-apiserver-operator" not found
'
  Normal   Pulling  23m (x5 over 48m)  kubelet, test-master-0  pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:5322f24567b8ee626fb19b9ec2a2eda3595bff46820fcca14e75de0fdddc6805"
  Normal   Pulled   23m (x5 over 48m)  kubelet, test-master-0  Successfully pulled image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-11-201342@sha256:5322f24567b8ee626fb19b9ec2a2eda3595bff46820fcca14e75de0fdddc6805"
  Normal   Created  23m (x5 over 48m)  kubelet, test-master-0  Created container
  Normal   Started  23m (x5 over 48m)  kubelet, test-master-0  Started container
  Warning  BackOff  3m (x26 over 37m)  kubelet, test-master-0  Back-off restarting failed container

praveenkumar commented 5 years ago

I can see some of your pods are in OOMKilled state which means the provided resources are not enough.

You can use the following two environment variables to adjust the RAM and CPU (the defaults are shown) and then try again:

TF_VAR_libvirt_master_memory=4096
TF_VAR_libvirt_master_vcpu=2

TristanCacqueray commented 5 years ago

@praveenkumar I was already using TF_VAR_libvirt_master_memory=8192 TF_VAR_libvirt_master_vcpu=4 (as suggested in https://github.com/openshift/installer/pull/1217). The host has 16GB of ram and 8cpu.

leseb commented 5 years ago

@TristanCacqueray did you ever get pass that issue? Better luck with a different version perhaps? Thanks.

TristanCacqueray commented 5 years ago

@leseb no luck with last version: openshift-install unreleased-master-550-g507b62e7609fb54abfb4357395820b5fd8b6d635

First it failed with "cannot set up guest memory 'pc.ram': Cannot allocate memory'" when using TF_VAR_libvirt_master_memory=8192 with a 16GB host. Using 4096 instead resulted in:

$ env TF_VAR_libvirt_master_memory=4096 TF_VAR_libvirt_master_vcpu=2  ./bin/openshift-install create cluster
INFO Consuming "Kubeconfig Admin Client" from target directory 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.12.4+7f96bae up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 30m0s for the cluster to initialize... 
FATAL failed to initialize the cluster: timed out waiting for the condition
$ tail .openshift_install.log
time="2019-03-13T04:46:42Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:47:13Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:48:23Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:50:27Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager has not yet reported success"
time="2019-03-13T04:51:12Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:54:27Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager has not yet reported success"
time="2019-03-13T04:55:27Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:58:57Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager has not yet reported success"
time="2019-03-13T05:00:57Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T05:03:58Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

$ oc get pods --all-namespaces
NAMESPACE                                    NAME                                                 READY     STATUS       RESTARTS   AGE
kube-system                                  etcd-member-test-27jf9-master-0                      1/1       Running      0          93m
openshift-cloud-credential-operator          cloud-credential-operator-86b4c8dbb8-2v86x           0/1       Preempting   0          79m
openshift-cloud-credential-operator          cloud-credential-operator-86b4c8dbb8-ntdzt           0/1       Preempting   0          81m
openshift-cloud-credential-operator          cloud-credential-operator-86b4c8dbb8-rkxds           0/1       Pending      0          72m
openshift-cloud-credential-operator          cloud-credential-operator-86b4c8dbb8-zrhcj           0/1       Preempting   0          84m
openshift-cluster-machine-approver           machine-approver-7bd85b5fd5-ztlvn                    1/1       Running      0          91m
openshift-cluster-version                    cluster-version-operator-6ff79dc768-kzk26            1/1       Running      2          93m
openshift-dns-operator                       dns-operator-74444967b8-b4nk5                        1/1       Running      0          92m
openshift-dns                                dns-default-6zw9c                                    2/2       Running      0          75m
openshift-dns                                dns-default-k6fm8                                    2/2       Running      0          92m
openshift-kube-apiserver-operator            kube-apiserver-operator-5576dc5bcc-8rfh5             1/1       Running      4          79m
openshift-kube-apiserver                     installer-1-test-27jf9-master-0                      0/1       OOMKilled    0          90m
openshift-kube-apiserver                     installer-4-test-27jf9-master-0                      0/1       OOMKilled    0          84m
openshift-kube-apiserver                     installer-5-test-27jf9-master-0                      0/1       OOMKilled    0          82m
openshift-kube-apiserver                     installer-6-test-27jf9-master-0                      0/1       Completed    0          80m
openshift-kube-apiserver                     installer-7-test-27jf9-master-0                      0/1       OOMKilled    0          75m
openshift-kube-apiserver                     installer-8-test-27jf9-master-0                      0/1       Completed    0          73m
openshift-kube-apiserver                     installer-9-test-27jf9-master-0                      0/1       Completed    0          70m
openshift-kube-apiserver                     kube-apiserver-test-27jf9-master-0                   2/2       Running      0          70m
openshift-kube-apiserver                     revision-pruner-1-test-27jf9-master-0                0/1       Completed    0          89m
openshift-kube-apiserver                     revision-pruner-4-test-27jf9-master-0                0/1       Completed    0          82m
openshift-kube-apiserver                     revision-pruner-5-test-27jf9-master-0                0/1       Completed    0          80m
openshift-kube-apiserver                     revision-pruner-6-test-27jf9-master-0                0/1       OOMKilled    0          75m
openshift-kube-apiserver                     revision-pruner-7-test-27jf9-master-0                0/1       Completed    0          73m
openshift-kube-apiserver                     revision-pruner-8-test-27jf9-master-0                0/1       Completed    0          70m
openshift-kube-apiserver                     revision-pruner-9-test-27jf9-master-0                0/1       OOMKilled    0          68m
openshift-kube-controller-manager-operator   kube-controller-manager-operator-7db795976d-sdgvh    1/1       Running      6          87m
openshift-kube-controller-manager            installer-1-test-27jf9-master-0                      0/1       Completed    0          86m
openshift-kube-controller-manager            installer-3-test-27jf9-master-0                      0/1       Completed    0          82m
openshift-kube-controller-manager            installer-4-test-27jf9-master-0                      0/1       Completed    0          80m
openshift-kube-controller-manager            installer-5-test-27jf9-master-0                      0/1       Completed    0          77m
openshift-kube-controller-manager            installer-6-test-27jf9-master-0                      0/1       Completed    0          73m
openshift-kube-controller-manager            installer-7-test-27jf9-master-0                      0/1       Completed    0          66m
openshift-kube-controller-manager            kube-controller-manager-test-27jf9-master-0          1/1       Running      2          65m
openshift-kube-controller-manager            revision-pruner-1-test-27jf9-master-0                0/1       Completed    0          86m
openshift-kube-controller-manager            revision-pruner-3-test-27jf9-master-0                0/1       Completed    0          80m
openshift-kube-controller-manager            revision-pruner-4-test-27jf9-master-0                0/1       Completed    0          77m
openshift-kube-controller-manager            revision-pruner-5-test-27jf9-master-0                0/1       Completed    0          76m
openshift-kube-controller-manager            revision-pruner-6-test-27jf9-master-0                0/1       Completed    0          66m
openshift-kube-controller-manager            revision-pruner-7-test-27jf9-master-0                0/1       Completed    0          65m
openshift-kube-scheduler-operator            openshift-kube-scheduler-operator-85cd8b7969-5sl77   0/1       Preempting   0          88m
openshift-kube-scheduler-operator            openshift-kube-scheduler-operator-85cd8b7969-zzx7k   0/1       Pending      0          75m
openshift-kube-scheduler                     installer-1-test-27jf9-master-0                      0/1       Completed    0          87m
openshift-kube-scheduler                     installer-2-test-27jf9-master-0                      0/1       Completed    0          82m
openshift-kube-scheduler                     installer-3-test-27jf9-master-0                      0/1       Completed    0          78m
openshift-kube-scheduler                     openshift-kube-scheduler-test-27jf9-master-0         0/1       Preempting   0          77m
openshift-kube-scheduler                     revision-pruner-1-test-27jf9-master-0                0/1       OOMKilled    0          84m
openshift-kube-scheduler                     revision-pruner-2-test-27jf9-master-0                0/1       Completed    0          81m
openshift-kube-scheduler                     revision-pruner-3-test-27jf9-master-0                0/1       Completed    0          77m
openshift-machine-api                        clusterapi-manager-controllers-765c4ff8cc-zfvpp      4/4       Running      0          81m
openshift-machine-api                        machine-api-operator-7b76fdd588-255b5                1/1       Running      0          84m
openshift-machine-config-operator            machine-config-controller-5757878458-x62jv           1/1       Running      1          81m
openshift-machine-config-operator            machine-config-daemon-kdd8z                          1/1       Running      0          79m
openshift-machine-config-operator            machine-config-daemon-n48sc                          1/1       Running      0          73m
openshift-machine-config-operator            machine-config-operator-7f6dcc4ccd-7tk7d             1/1       Running      0          79m
openshift-machine-config-operator            machine-config-server-2zb9r                          1/1       Running      0          80m
openshift-multus                             multus-qcbn5                                         1/1       Running      0          75m
openshift-multus                             multus-zqh4c                                         1/1       Running      0          93m
openshift-network-operator                   network-operator-669bbb6f55-bgkjw                    1/1       Running      0          93m
openshift-operator-lifecycle-manager         catalog-operator-8f5b976df-pwj7n                     0/1       Pending      0          70m
openshift-operator-lifecycle-manager         olm-operator-6fbc89557f-rzwb5                        0/1       Pending      0          70m
openshift-sdn                                ovs-jr5th                                            1/1       Running      0          93m
openshift-sdn                                ovs-mwvz7                                            1/1       Running      0          75m
openshift-sdn                                sdn-controller-cl74v                                 1/1       Running      2          77m
openshift-sdn                                sdn-r5lwl                                            1/1       Running      1          93m
openshift-sdn                                sdn-rq6sq                                            1/1       Running      0          75m
openshift-service-ca-operator                openshift-service-ca-operator-79cd74fbb-pj5lq        1/1       Running      6          91m
openshift-service-ca                         apiservice-cabundle-injector-f6f7f9967-q7bg4         1/1       Running      5          91m
openshift-service-ca                         configmap-cabundle-injector-bfd95-dmpxq              1/1       Running      4          91m
openshift-service-ca                         service-serving-cert-signer-6778cd64f6-k77h2         1/1       Running      4          91m

And using this command

$ oc get pods --all-namespaces --no-headers | egrep -v 'Running|Completed' | awk '{ print $1 " " $2 " " $4 }' | while read ns pod status; do echo -e "\n\n$ns: $pod - $status"; oc describe -n $ns pod/$pod; done

Seems to show that there is not enough memory:

openshift-kube-scheduler-operator: openshift-kube-scheduler-operator-85cd8b7969-zzx7k - Pending
Name:               openshift-kube-scheduler-operator-85cd8b7969-zzx7k
Namespace:          openshift-kube-scheduler-operator
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               <none>
Labels:             app=openshift-kube-scheduler-operator
                    pod-template-hash=85cd8b7969
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/openshift-kube-scheduler-operator-85cd8b7969
Containers:
  kube-scheduler-operator-container:
    Image:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-13-010143@sha256:9a2160a24860b80bf580999398fc4661eed4100b38e786b7c6e0391149d843af
    Port:       <none>
    Host Port:  <none>
    Command:
      cluster-kube-scheduler-operator
      operator
    Args:
      --config=/var/run/configmaps/config/config.yaml
      -v=4
    Requests:
      memory:  50Mi
    Environment:
      IMAGE:                   registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-13-010143@sha256:74280ea831ae49ae162e812dba523524b0be26ae82950e88115925c6c2a6d48b
      OPERATOR_IMAGE:          registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-13-010143@sha256:9a2160a24860b80bf580999398fc4661eed4100b38e786b7c6e0391149d843af
      OPERATOR_IMAGE_VERSION:  4.0.0-0.alpha-2019-03-13-010143
      POD_NAME:                openshift-kube-scheduler-operator-85cd8b7969-zzx7k (v1:metadata.name)
    Mounts:
      /var/run/configmaps/config from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from openshift-kube-scheduler-operator-token-62jd9 (ro)
      /var/run/secrets/serving-cert from serving-cert (rw)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  serving-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-scheduler-operator-serving-cert
    Optional:    true
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      openshift-kube-scheduler-operator-config
    Optional:  false
  openshift-kube-scheduler-operator-token-62jd9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  openshift-kube-scheduler-operator-token-62jd9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason            Age              From               Message
  ----     ------            ----             ----               -------
  Warning  FailedScheduling  1h (x2 over 1h)  default-scheduler  0/1 nodes are available: 1 Insufficient memory.
  Warning  FailedScheduling  1h (x3 over 1h)  default-scheduler  0/2 nodes are available: 1 Insufficient memory, 1 node(s) didn't match node selector.
  Warning  FailedScheduling  1h (x6 over 1h)  default-scheduler  0/2 nodes are available: 1 Insufficient memory, 1 node(s) didn't match node selector.

Then using 7168MB failed differently:

$ env TF_VAR_libvirt_master_memory=7168 TF_VAR_libvirt_master_vcpu=4 ./bin/openshift-install create cluster
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
FATAL waiting for Kubernetes API: context deadline exceeded 
$ tail -f .openshift_install.log 
time="2019-03-13T07:43:22Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:43:52Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:44:22Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:44:52Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:45:22Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:45:52Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:46:23Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:46:53Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:47:23Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:47:47Z" level=fatal msg="waiting for Kubernetes API: context deadline exceeded"

leseb commented 5 years ago

I'm running a bigger VM (20GB) and I'm able to go further but now I'm stuck with https://github.com/openshift/installer/issues/1406. I think we need 16GB to get the thing running properly (at least).

TristanCacqueray commented 5 years ago

Cannot reproduce this issue with latest master.

openshift / installer

libvirt: failed to initialize the cluster, openshift-kube-apiserver-operator leaderelection.go:249 context deadline exceeded #1237

Version

Platform (aws|libvirt|openstack):

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?