Closed frippe75 closed 5 years ago
Searched masters and found
$ sudo oc get pods --all-namespaces --config=/tmp/openshift-cluster-monitoring-ansible-HpkEk0/admin.kubeconfig
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-monitoring cluster-monitoring-operator-6465f8fbc7-ng4rv 0/1 ImagePullBackOff 0 5h
A few sec's later
openshift-monitoring cluster-monitoring-operator-6465f8fbc7-ng4rv 0/1 ErrImagePull 0 5h
Doing a oc describe pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 1h (x816 over 5h) kubelet, infra-node-0.os.lab.net Back-off pulling image "quay.io/coreos/cluster-monitoring-operator:v0.1.1"
Normal Pulling 56m (x46 over 5h) kubelet, infra-node-0.os.lab.net pulling image "quay.io/coreos/cluster-monitoring-operator:v0.1.1"
Warning Failed 6m (x1063 over 5h) kubelet, infra-node-0.os.lab.net Error: ImagePullBackOff
Warning Failed 22s (x54 over 5h) kubelet, infra-node-0.os.lab.net Failed to pull image "quay.io/coreos/cluster-monitoring-operator:v0.1.1": rpc error: code = Canceled desc = context canceled
How can I restart, pull it manually or whatever to fix this??
I'm behind a fairly slow proxy. If this is related to timeouts or something similar. Could I extend them.
A manual docker pull work fine. But is a bit lengthy
@frippe75 I'm experiencing the same issue here.
Wait for the ServiceMonitor CRD to be created:
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found
I'm still poking around to see if it's my configuration or not.
If I add:
openshift_cluster_monitoring_operator_install=false
to my inventory, it will skip this step, but that isn't ideal -- additionally, if I do skip this install, the Web Console will fail to install.
Some additional information:
In my environment, running oc get pods --all-namespaces
returns:
NAMESPACE NAME READY STATUS RESTARTS AGE
default docker-registry-1-deploy 0/1 Pending 0 2m
default registry-console-1-deploy 0/1 Pending 0 2m
default router-1-deploy 0/1 Pending 0 3m
kube-system master-api-master.example.com 1/1 Running 0 7m
kube-system master-controllers-master.example.com 1/1 Running 0 7m
kube-system master-etcd-master.example.com 1/1 Running 0 7m
openshift-monitoring cluster-monitoring-operator-6465f8fbc7-hkxb9 0/1 Pending 0 2m
openshift-node sync-s5xsp 1/1 Running 0 5m
openshift-node sync-sws9w 1/1 Running 0 4m
openshift-node sync-w5rbt 1/1 Running 0 4m
openshift-sdn ovs-49pjp 1/1 Running 0 4m
openshift-sdn ovs-7m2vk 1/1 Running 0 4m
openshift-sdn ovs-t7zfc 1/1 Running 0 4m
openshift-sdn sdn-rlbk8 0/1 CrashLoopBackOff 5 4m
openshift-sdn sdn-wf7lf 0/1 CrashLoopBackOff 5 4m
openshift-sdn sdn-z27xl 0/1 CrashLoopBackOff 7 4m
May be a red herring, but my openshift-sdn nodes are also in a CrashLoopBackOff state.. I'm not sure if this is causing the ServiceMonitor CRD to fail, or if there's another root cause here.
Is this occurring only in version 3.11?
i will try a install with openshift_release="3.10"
How can I restart, pull it manually or whatever to fix this??
You should re-pull it manually and rerun the playbook
May be a red herring, but my openshift-sdn nodes are also in a CrashLoopBackOff state
Different problem, SDN is required for cluster to function.
None of the problems are openshift-ansible issues, closing this.
Please avoid commenting with "I have the same problem too" - in most cases these are different problems and have different solutions, open a new issue instead.
@DizzyThermal Is this problem resolved?
How can I restart, pull it manually or whatever to fix this??
You should re-pull it manually and rerun the playbook None of the problems are openshift-ansible issues, closing this.
I tried to re-deploy again with the same issue at the same step. Actually happens 100/100. So is the "wait-period" for this pull shorter than for other images or simply due to the size? I will try to find the answer myself but a playbook that fails consistently and for multiple users in something that could be solved and considered an issue. I will see if I can address it but simply closing it without discussing or reasoning about it seems forced. Sorry, dont want this to come across in a rude way.
Hi,
I'm having the same issue upgrading from 3.10 to 3.11
Same here with 3.11.
Same here with 3.11.
Description
Deploying using openstack playbooks. Worked through some issues with DNS. But fail quite late in the process
Version
Observed Results
Describe what is actually happening.
For long output or logs, consider using a gist