Closed TristanCacqueray closed 5 years ago
I can see some of your pods are in OOMKilled
state which means the provided resources are not enough.
You can use the following two environment variables to adjust the RAM and CPU (the defaults are shown) and then try again:
TF_VAR_libvirt_master_memory=4096
TF_VAR_libvirt_master_vcpu=2
@praveenkumar I was already using TF_VAR_libvirt_master_memory=8192 TF_VAR_libvirt_master_vcpu=4 (as suggested in https://github.com/openshift/installer/pull/1217). The host has 16GB of ram and 8cpu.
@TristanCacqueray did you ever get pass that issue? Better luck with a different version perhaps? Thanks.
@leseb no luck with last version: openshift-install unreleased-master-550-g507b62e7609fb54abfb4357395820b5fd8b6d635
First it failed with "cannot set up guest memory 'pc.ram': Cannot allocate memory'" when using TF_VAR_libvirt_master_memory=8192 with a 16GB host. Using 4096 instead resulted in:
$ env TF_VAR_libvirt_master_memory=4096 TF_VAR_libvirt_master_vcpu=2 ./bin/openshift-install create cluster
INFO Consuming "Kubeconfig Admin Client" from target directory
INFO Creating cluster...
INFO Waiting up to 30m0s for the Kubernetes API...
INFO API v1.12.4+7f96bae up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster to initialize...
FATAL failed to initialize the cluster: timed out waiting for the condition
$ tail .openshift_install.log
time="2019-03-13T04:46:42Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:47:13Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:48:23Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:50:27Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager has not yet reported success"
time="2019-03-13T04:51:12Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:54:27Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager has not yet reported success"
time="2019-03-13T04:55:27Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T04:58:57Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator operator-lifecycle-manager has not yet reported success"
time="2019-03-13T05:00:57Z" level=debug msg="Still waiting for the cluster to initialize..."
time="2019-03-13T05:03:58Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"
$ oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-member-test-27jf9-master-0 1/1 Running 0 93m
openshift-cloud-credential-operator cloud-credential-operator-86b4c8dbb8-2v86x 0/1 Preempting 0 79m
openshift-cloud-credential-operator cloud-credential-operator-86b4c8dbb8-ntdzt 0/1 Preempting 0 81m
openshift-cloud-credential-operator cloud-credential-operator-86b4c8dbb8-rkxds 0/1 Pending 0 72m
openshift-cloud-credential-operator cloud-credential-operator-86b4c8dbb8-zrhcj 0/1 Preempting 0 84m
openshift-cluster-machine-approver machine-approver-7bd85b5fd5-ztlvn 1/1 Running 0 91m
openshift-cluster-version cluster-version-operator-6ff79dc768-kzk26 1/1 Running 2 93m
openshift-dns-operator dns-operator-74444967b8-b4nk5 1/1 Running 0 92m
openshift-dns dns-default-6zw9c 2/2 Running 0 75m
openshift-dns dns-default-k6fm8 2/2 Running 0 92m
openshift-kube-apiserver-operator kube-apiserver-operator-5576dc5bcc-8rfh5 1/1 Running 4 79m
openshift-kube-apiserver installer-1-test-27jf9-master-0 0/1 OOMKilled 0 90m
openshift-kube-apiserver installer-4-test-27jf9-master-0 0/1 OOMKilled 0 84m
openshift-kube-apiserver installer-5-test-27jf9-master-0 0/1 OOMKilled 0 82m
openshift-kube-apiserver installer-6-test-27jf9-master-0 0/1 Completed 0 80m
openshift-kube-apiserver installer-7-test-27jf9-master-0 0/1 OOMKilled 0 75m
openshift-kube-apiserver installer-8-test-27jf9-master-0 0/1 Completed 0 73m
openshift-kube-apiserver installer-9-test-27jf9-master-0 0/1 Completed 0 70m
openshift-kube-apiserver kube-apiserver-test-27jf9-master-0 2/2 Running 0 70m
openshift-kube-apiserver revision-pruner-1-test-27jf9-master-0 0/1 Completed 0 89m
openshift-kube-apiserver revision-pruner-4-test-27jf9-master-0 0/1 Completed 0 82m
openshift-kube-apiserver revision-pruner-5-test-27jf9-master-0 0/1 Completed 0 80m
openshift-kube-apiserver revision-pruner-6-test-27jf9-master-0 0/1 OOMKilled 0 75m
openshift-kube-apiserver revision-pruner-7-test-27jf9-master-0 0/1 Completed 0 73m
openshift-kube-apiserver revision-pruner-8-test-27jf9-master-0 0/1 Completed 0 70m
openshift-kube-apiserver revision-pruner-9-test-27jf9-master-0 0/1 OOMKilled 0 68m
openshift-kube-controller-manager-operator kube-controller-manager-operator-7db795976d-sdgvh 1/1 Running 6 87m
openshift-kube-controller-manager installer-1-test-27jf9-master-0 0/1 Completed 0 86m
openshift-kube-controller-manager installer-3-test-27jf9-master-0 0/1 Completed 0 82m
openshift-kube-controller-manager installer-4-test-27jf9-master-0 0/1 Completed 0 80m
openshift-kube-controller-manager installer-5-test-27jf9-master-0 0/1 Completed 0 77m
openshift-kube-controller-manager installer-6-test-27jf9-master-0 0/1 Completed 0 73m
openshift-kube-controller-manager installer-7-test-27jf9-master-0 0/1 Completed 0 66m
openshift-kube-controller-manager kube-controller-manager-test-27jf9-master-0 1/1 Running 2 65m
openshift-kube-controller-manager revision-pruner-1-test-27jf9-master-0 0/1 Completed 0 86m
openshift-kube-controller-manager revision-pruner-3-test-27jf9-master-0 0/1 Completed 0 80m
openshift-kube-controller-manager revision-pruner-4-test-27jf9-master-0 0/1 Completed 0 77m
openshift-kube-controller-manager revision-pruner-5-test-27jf9-master-0 0/1 Completed 0 76m
openshift-kube-controller-manager revision-pruner-6-test-27jf9-master-0 0/1 Completed 0 66m
openshift-kube-controller-manager revision-pruner-7-test-27jf9-master-0 0/1 Completed 0 65m
openshift-kube-scheduler-operator openshift-kube-scheduler-operator-85cd8b7969-5sl77 0/1 Preempting 0 88m
openshift-kube-scheduler-operator openshift-kube-scheduler-operator-85cd8b7969-zzx7k 0/1 Pending 0 75m
openshift-kube-scheduler installer-1-test-27jf9-master-0 0/1 Completed 0 87m
openshift-kube-scheduler installer-2-test-27jf9-master-0 0/1 Completed 0 82m
openshift-kube-scheduler installer-3-test-27jf9-master-0 0/1 Completed 0 78m
openshift-kube-scheduler openshift-kube-scheduler-test-27jf9-master-0 0/1 Preempting 0 77m
openshift-kube-scheduler revision-pruner-1-test-27jf9-master-0 0/1 OOMKilled 0 84m
openshift-kube-scheduler revision-pruner-2-test-27jf9-master-0 0/1 Completed 0 81m
openshift-kube-scheduler revision-pruner-3-test-27jf9-master-0 0/1 Completed 0 77m
openshift-machine-api clusterapi-manager-controllers-765c4ff8cc-zfvpp 4/4 Running 0 81m
openshift-machine-api machine-api-operator-7b76fdd588-255b5 1/1 Running 0 84m
openshift-machine-config-operator machine-config-controller-5757878458-x62jv 1/1 Running 1 81m
openshift-machine-config-operator machine-config-daemon-kdd8z 1/1 Running 0 79m
openshift-machine-config-operator machine-config-daemon-n48sc 1/1 Running 0 73m
openshift-machine-config-operator machine-config-operator-7f6dcc4ccd-7tk7d 1/1 Running 0 79m
openshift-machine-config-operator machine-config-server-2zb9r 1/1 Running 0 80m
openshift-multus multus-qcbn5 1/1 Running 0 75m
openshift-multus multus-zqh4c 1/1 Running 0 93m
openshift-network-operator network-operator-669bbb6f55-bgkjw 1/1 Running 0 93m
openshift-operator-lifecycle-manager catalog-operator-8f5b976df-pwj7n 0/1 Pending 0 70m
openshift-operator-lifecycle-manager olm-operator-6fbc89557f-rzwb5 0/1 Pending 0 70m
openshift-sdn ovs-jr5th 1/1 Running 0 93m
openshift-sdn ovs-mwvz7 1/1 Running 0 75m
openshift-sdn sdn-controller-cl74v 1/1 Running 2 77m
openshift-sdn sdn-r5lwl 1/1 Running 1 93m
openshift-sdn sdn-rq6sq 1/1 Running 0 75m
openshift-service-ca-operator openshift-service-ca-operator-79cd74fbb-pj5lq 1/1 Running 6 91m
openshift-service-ca apiservice-cabundle-injector-f6f7f9967-q7bg4 1/1 Running 5 91m
openshift-service-ca configmap-cabundle-injector-bfd95-dmpxq 1/1 Running 4 91m
openshift-service-ca service-serving-cert-signer-6778cd64f6-k77h2 1/1 Running 4 91m
And using this command
$ oc get pods --all-namespaces --no-headers | egrep -v 'Running|Completed' | awk '{ print $1 " " $2 " " $4 }' | while read ns pod status; do echo -e "\n\n$ns: $pod - $status"; oc describe -n $ns pod/$pod; done
Seems to show that there is not enough memory:
openshift-kube-scheduler-operator: openshift-kube-scheduler-operator-85cd8b7969-zzx7k - Pending
Name: openshift-kube-scheduler-operator-85cd8b7969-zzx7k
Namespace: openshift-kube-scheduler-operator
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: <none>
Labels: app=openshift-kube-scheduler-operator
pod-template-hash=85cd8b7969
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/openshift-kube-scheduler-operator-85cd8b7969
Containers:
kube-scheduler-operator-container:
Image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-13-010143@sha256:9a2160a24860b80bf580999398fc4661eed4100b38e786b7c6e0391149d843af
Port: <none>
Host Port: <none>
Command:
cluster-kube-scheduler-operator
operator
Args:
--config=/var/run/configmaps/config/config.yaml
-v=4
Requests:
memory: 50Mi
Environment:
IMAGE: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-13-010143@sha256:74280ea831ae49ae162e812dba523524b0be26ae82950e88115925c6c2a6d48b
OPERATOR_IMAGE: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-13-010143@sha256:9a2160a24860b80bf580999398fc4661eed4100b38e786b7c6e0391149d843af
OPERATOR_IMAGE_VERSION: 4.0.0-0.alpha-2019-03-13-010143
POD_NAME: openshift-kube-scheduler-operator-85cd8b7969-zzx7k (v1:metadata.name)
Mounts:
/var/run/configmaps/config from config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from openshift-kube-scheduler-operator-token-62jd9 (ro)
/var/run/secrets/serving-cert from serving-cert (rw)
Conditions:
Type Status
PodScheduled False
Volumes:
serving-cert:
Type: Secret (a volume populated by a Secret)
SecretName: kube-scheduler-operator-serving-cert
Optional: true
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: openshift-kube-scheduler-operator-config
Optional: false
openshift-kube-scheduler-operator-token-62jd9:
Type: Secret (a volume populated by a Secret)
SecretName: openshift-kube-scheduler-operator-token-62jd9
Optional: false
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/master=
Tolerations:
node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 1h (x2 over 1h) default-scheduler 0/1 nodes are available: 1 Insufficient memory.
Warning FailedScheduling 1h (x3 over 1h) default-scheduler 0/2 nodes are available: 1 Insufficient memory, 1 node(s) didn't match node selector.
Warning FailedScheduling 1h (x6 over 1h) default-scheduler 0/2 nodes are available: 1 Insufficient memory, 1 node(s) didn't match node selector.
Then using 7168MB failed differently:
$ env TF_VAR_libvirt_master_memory=7168 TF_VAR_libvirt_master_vcpu=4 ./bin/openshift-install create cluster
INFO Creating cluster...
INFO Waiting up to 30m0s for the Kubernetes API...
FATAL waiting for Kubernetes API: context deadline exceeded
$ tail -f .openshift_install.log
time="2019-03-13T07:43:22Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:43:52Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:44:22Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:44:52Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:45:22Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:45:52Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:46:23Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:46:53Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: connection refused"
time="2019-03-13T07:47:23Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
time="2019-03-13T07:47:47Z" level=fatal msg="waiting for Kubernetes API: context deadline exceeded"
I'm running a bigger VM (20GB) and I'm able to go further but now I'm stuck with https://github.com/openshift/installer/issues/1406. I think we need 16GB to get the thing running properly (at least).
Cannot reproduce this issue with latest master.
Version
Platform (aws|libvirt|openstack):
libvirt
What happened?
In .openshift_install.log:
What you expected to happen?
The cluster to be deployed.
How to reproduce it (as minimally and precisely as possible)?
I used this playbook to setup the hypervisor on a fedora-29 instance:
Then run the install command as described in: https://github.com/openshift/installer/pull/1217
Anything else we need to know?
It seems like those pods failed to start because of "failed to tryAcquireOrRenew context deadline exceeded" resulting in "leaderelection.go:65 leaderelection lost":
Also, the openshift-apiserver-operator failed with: CNI request failed with status 400: 'failed to find netid for namespace: openshift-apiserver-operator, netnamespaces.network.openshift.io "openshift-apiserver-operator" not found