Closed toni-moreno closed 6 years ago
Could you post oc describe pod/pod-...
of some of the pods that failed scheduling? I see partial error 0/4 nodes are available: 4 MatchNodeSelector
. Sounds like node labeling issue, where some nodes should be marked as compute nodes. IIRC in 3.9 default node selector has been introduced and not sure whether this is well documented or not.
Hi @akostadinov here the output for the 2 pending pods
[root@openshift01 localvolumes]# oc get all
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-ops-view 1 1 1 0 11h
deploy/kube-ops-view-redis 1 1 1 0 11h
NAME DESIRED CURRENT READY AGE
rs/kube-ops-view-758bf655f4 1 1 0 11h
rs/kube-ops-view-redis-7cd4b9cccc 1 1 0 11h
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
routes/kube-ops-view kube-ops-view-ocp-ops-view.apps.tonimoreno.org kube-ops-view 8080 None
NAME READY STATUS RESTARTS AGE
po/kube-ops-view-758bf655f4-9s8vf 0/1 Pending 0 11h
po/kube-ops-view-redis-7cd4b9cccc-4gp8l 0/1 Pending 0 11h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-ops-view ClusterIP 172.30.48.133 <none> 8080/TCP 11h
svc/kube-ops-view-redis ClusterIP 172.30.22.19 <none> 6379/TCP 11h
[root@openshift01 localvolumes]# oc describe po/kube-ops-view-758bf655f4-9s8vf
Name: kube-ops-view-758bf655f4-9s8vf
Namespace: ocp-ops-view
Node: <none>
Labels: application=kube-ops-view
pod-template-hash=3146921190
version=v0.0.1
Annotations: openshift.io/scc=anyuid
Status: Pending
IP:
Controlled By: ReplicaSet/kube-ops-view-758bf655f4
Containers:
service:
Image: raffaelespazzoli/ocp-ops-view:latest
Port: 8080/TCP
Args:
--redis-url=redis://kube-ops-view-redis:6379
Limits:
cpu: 200m
memory: 100Mi
Requests:
cpu: 50m
memory: 50Mi
Readiness: http-get http://:8080/health delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-ops-view-token-kp5c2 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-ops-view-token-kp5c2:
Type: Secret (a volume populated by a Secret)
SecretName: kube-ops-view-token-kp5c2
Optional: false
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/compute=true
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m (x1512 over 7h) default-scheduler 0/4 nodes are available: 2 NodeNotReady, 2 NodeOutOfDisk, 4 MatchNodeSelector.
Warning FailedScheduling 3m (x377 over 10h) default-scheduler 0/4 nodes are available: 1 NodeNotReady, 1 NodeOutOfDisk, 4 MatchNodeSelector.
[root@openshift01 localvolumes]# oc describe po/kube-ops-view-redis-7cd4b9cccc-4gp8l
Name: kube-ops-view-redis-7cd4b9cccc-4gp8l
Namespace: ocp-ops-view
Node: <none>
Labels: application=kube-ops-view-redis
pod-template-hash=3780657777
version=v0.0.1
Annotations: openshift.io/scc=anyuid
Status: Pending
IP:
Controlled By: ReplicaSet/kube-ops-view-redis-7cd4b9cccc
Containers:
redis:
Image: redis:3.2-alpine
Port: 6379/TCP
Limits:
cpu: 200m
memory: 100Mi
Requests:
cpu: 50m
memory: 50Mi
Readiness: tcp-socket :6379 delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-ops-view-token-kp5c2 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-ops-view-token-kp5c2:
Type: Secret (a volume populated by a Secret)
SecretName: kube-ops-view-token-kp5c2
Optional: false
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/compute=true
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m (x1512 over 7h) default-scheduler 0/4 nodes are available: 2 NodeNotReady, 2 NodeOutOfDisk, 4 MatchNodeSelector.
Warning FailedScheduling 3m (x377 over 10h) default-scheduler 0/4 nodes are available: 1 NodeNotReady, 1 NodeOutOfDisk, 4 MatchNodeSelector.
here the node info, just now.
[root@openshift01 localvolumes]# oc get nodes
NAME STATUS ROLES AGE VERSION
openshift01 Ready master 12h v1.9.1+a0ce1bc657
openshift05 Ready <none> 11h v1.9.1+a0ce1bc657
openshift06 Ready <none> 11h v1.9.1+a0ce1bc657
openshift07 Ready <none> 11h v1.9.1+a0ce1bc657
Ok, so in pods you can see Node-Selectors: node-role.kubernetes.io/compute=true
. You need to have this label set for the nodes that you want to use for compute. i.e. oc edit node openshift05
and insert this label node-role.kubernetes.io/compute: "true"
. That should make your pods scheduled.
Another approach would be to remove default node selector from master-config.yml
. But I prefer setting the necessary labels.
P.S. I'm somehow worried about 1 NodeNotReady, 1 NodeOutOfDisk
. oc get nodes
doesn't show anything, but it is worth checking oc describe nodes
as well looking an node logs to check for anything bad.
Hi @akostadinov , If I've edited nodes and added this new label node-role.kubernetes.io/compute: "true", now everything working fine!!!!
[root@openshift01 installcentos]# oc get all
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-ops-view 1 1 1 1 4m
deploy/kube-ops-view-redis 1 1 1 1 4m
NAME DESIRED CURRENT READY AGE
rs/kube-ops-view-758bf655f4 1 1 1 4m
rs/kube-ops-view-redis-7cd4b9cccc 1 1 1 4m
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
routes/kube-ops-view kube-ops-view-ocp-ops-view.apps.myorg.org kube-ops-view 8080 None
NAME READY STATUS RESTARTS AGE
po/kube-ops-view-758bf655f4-9sff8 1/1 Running 0 4m
po/kube-ops-view-redis-7cd4b9cccc-7c9l5 1/1 Running 0 4m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-ops-view ClusterIP 172.30.58.199 <none> 8080/TCP 4m
svc/kube-ops-view-redis ClusterIP 172.30.135.103 <none> 6379/TCP 4m
Somewhere in the origin doc should be specified that this is a mandatory label to get things running
Don't worry about NodeNotReady , sometimes I do node crash simulations ( to testing service recovery )
Thank you a lot, now I'm going to get back to the local volume issue
@akostadinov sorry for this question... but while editing /etc/origin/node/node-config.yaml I've seen that I have not the new label inside the node config
....
kind: NodeConfig
kubeletArguments:
node-labels:
- region=infra
- nodetype=default
- zone=default
pods-per-core:
- '50'
....
But it has in online.
[root@openshift01 installcentos]# oc describe node openshift01
Name: openshift01
Roles: compute,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=openshift01
node-role.kubernetes.io/compute=true
node-role.kubernetes.io/master=true
nodetype=default
region=infra
zone=default
Should I suppose that I will lost the label on next node restart?
wrt default labeling , this is documented in quick install quide, actually pointing at release notes.
I am wondering whether you followed the guide and the end result is missing the labels. In such case there needs to be something fixed.
wrt your question, I believe that restarting the node shouldn't lose your label change. But if you remove the node from API oc delete node mynode
, then start the node, the node will be recreated without your modifications. I don't think oc edit
is supposed to change node-config.yml
. But if you believe this is necessary, feel free to file a RFE so we see what is team's opinion about it.
Ok, so in pods you can see
Node-Selectors: node-role.kubernetes.io/compute=true
. You need to have this label set for the nodes that you want to use for compute. i.e.oc edit node openshift05
and insert this labelnode-role.kubernetes.io/compute: "true"
. That should make your pods scheduled.Another approach would be to remove default node selector from
master-config.yml
. But I prefer setting the necessary labels.P.S. I'm somehow worried about
1 NodeNotReady, 1 NodeOutOfDisk
.oc get nodes
doesn't show anything, but it is worth checkingoc describe nodes
as well looking an node logs to check for anything bad.
Life SAVER
I've installed Openshift Origin 3.9 OK with ansible in 4 nodes ( 1 master + 3 nodes with glusterfs , also with register in gluster) with CentOS 7.4.1708 (Core)
Here the inventory.ini file.
And installed with these two commands.
With this result
Then I've installed something to test ( kube-ops-view by example)
With this result
Version
Steps To Reproduce
Install 4 nodes with previous inventory file and exectute ansible
Current Result
I can not use the cluster
Expected Result