Closed anandrkskd closed 3 years ago
Warning FailedScheduling 22m default-scheduler 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 22m default-scheduler 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 20m default-scheduler 0/9 nodes are available: 9 node(s) had volume node affinity conflict.
This suggests something is happening with cluster storage setup.
What storage classes are configured on the cluster?
What happens if yo try to create simple deployment with PVC like this?
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test
name: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: test-pvc
containers:
- image: busybox
name: busybox
command:
- "sleep"
- "infinity"
resources: {}
volumeMounts:
- name: data
mountPath: /data
and than getting info about it
kubectl describe deployments.apps test
kubectl describe pvc test-pvc
What storage classes are configured on the cluster?
On default its ibmc-vpc-block-10iops-tier
, I think for storage on IBMcloud we are using Block storage for VPC
What happens if yo try to create simple deployment with PVC like this?
For the simple deployment with PVC you shared works fine. Pods were able to spin up.
kubectl describe deployments.apps test
Name: test
Namespace: default
CreationTimestamp: Wed, 28 Jul 2021 09:58:47 +0530
Labels: app=test
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=test
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=test
Containers:
busybox:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sleep
infinity
Environment: <none>
Mounts:
/data from data (rw)
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: test-pvc
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: test-9d46557cd (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m50s deployment-controller Scaled up replica set test-9d46557cd to 1
kubectl describe pvc test-pvc \
>
Name: test-pvc
Namespace: default
StorageClass: ibmc-vpc-block-10iops-tier
Status: Bound
Volume: pvc-74e4a5e0-1ab2-4c22-80f0-a07bc199c727
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: vpc.block.csi.ibm.io
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: test-9d46557cd-kp7wn
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 92s vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-0_cf26be28-fd68-4e3c-9ca6-2bd0e898563b External provisioner is provisioning volume for claim "default/test-pvc"
Normal ExternalProvisioning 19s (x6 over 92s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "vpc.block.csi.ibm.io" or manually created by system administrator
Normal ProvisioningSucceeded 13s vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-0_cf26be28-fd68-4e3c-9ca6-2bd0e898563b Successfully provisioned volume pvc-74e4a5e0-1ab2-4c22-80f0-a07bc199c727
Whenever we are using two pv that are created in two different zones then the pods are failing with error volume node affinity conflict
and pods fail to spin up.
for example
-> oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0f9b8ffb-4661-42f7-a372-3a41e148669e 10Gi RWO Delete Bound cmd-devfile-storage-test228ubx/secondvol-awhmll-app ibmc-vpc-block-10iops-tier 14h
pvc-22e0a37c-4f2b-4abc-a4cb-617b1f1eca14 10Gi RWO Delete Bound storage-test/firstvol-test-devfile-app ibmc-vpc-block-10iops-tier 19h
pvc-3092082f-2b76-4e01-b433-af74815363ea 10Gi RWO Delete Bound cmd-devfile-storage-test228pxs/secondvol-xadwvi-app ibmc-vpc-block-10iops-tier 14h
pvc-3c236748-4c64-45e2-8fe8-ebdff6431b54 10Gi RWO Delete Bound storage-test/secondvol-test-devfile-app ibmc-vpc-block-10iops-tier 19h
pvc-74e4a5e0-1ab2-4c22-80f0-a07bc199c727 10Gi RWO Delete Bound default/test-pvc ibmc-vpc-block-10iops-tier 15m
pvc-94beff15-e0f9-40b3-b52a-cc17e8a2c6c1 10Gi RWO Delete Bound cmd-devfile-storage-test228ubx/firstvol-awhmll-app ibmc-vpc-block-10iops-tier 14h
pvc-a63a6c91-35c5-43fe-9e70-c4166ded7ac6 10Gi RWO Delete Bound cmd-devfile-storage-test228pxs/firstvol-xadwvi-app ibmc-vpc-block-10iops-tier 14h
-> oc get pv pvc-0f9b8ffb-4661-42f7-a372-3a41e148669e -o yaml
apiVersion: v1
kind: PersistentVolume
...
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- eu-de
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- eu-de-1
persistentVolumeReclaimPolicy: Delete
storageClassName: ibmc-vpc-block-10iops-tier
volumeMode: Filesystem
status:
phase: Bound
-> oc get pv pvc-94beff15-e0f9-40b3-b52a-cc17e8a2c6c1 -o yaml
apiVersion: v1
kind: PersistentVolume
...
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- eu-de
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- eu-de-1
persistentVolumeReclaimPolicy: Delete
storageClassName: ibmc-vpc-block-10iops-tier
volumeMode: Filesystem
status:
phase: Bound
and for filing case its
oc get pv pvc-3092082f-2b76-4e01-b433-af74815363ea -o yaml
apiVersion: v1
kind: PersistentVolume
...
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- eu-de
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- eu-de-3
persistentVolumeReclaimPolicy: Delete
storageClassName: ibmc-vpc-block-10iops-tier
volumeMode: Filesystem
status:
phase: Bound
-> % oc get pv pvc-a63a6c91-35c5-43fe-9e70-c4166ded7ac6 -o yaml
apiVersion: v1
kind: PersistentVolume
...
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- eu-de
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- eu-de-1
persistentVolumeReclaimPolicy: Delete
storageClassName: ibmc-vpc-block-10iops-tier
volumeMode: Filesystem
status:
phase: Bound
Whenever we are using two pv that are created in two different zones then the pods are failing with error
volume node affinity conflict
and pods fail to spin up.
this looks like infra issue
I was thinking if we can create the cluster on one zone only, we can avoid this problem, I guess there should be an option to select only one zone to create a cluster.
@rnapoles-rh does IBMCloud gives us such option while creating a cluster?
I was thinking if we can create the cluster on one zone only, we can avoid this problem
This could be the solution that helps us fix it. Saying based on this answer on stackoverflow. We don't need to spread our cluster among zones for CI, do we? It's not a long running cluster anyway, right?
We don't need to spread our cluster among zones for CI, do we?
No, but I am currently not sure if IBMcloud have any option to select zone before creating cluster.
It's not a long running cluster anyway, right?
No, it will be a long running cluster just like the cluster we have on PSI
I was thinking if we can create the cluster on one zone only, we can avoid this problem, I guess there should be an option to select only one zone to create a cluster.
@rnapoles-rh does IBMCloud gives us such option while creating a cluster?
@anandrkskd yes, we can have clusters using only one zone instead of three. The Cluster named devtools-1-4vcpu-16gb-3w was provisioned using only one zone (eu-de-1) . You should be able to see it in the IBM Cloud web UI under OpenShift clusters.
@rnapoles-rh Running 10 consecutive test for the above situation passed successfully on the cluster with only one zone.
@rnapoles-rh Running 10 consecutive test for the above situation passed successfully on the cluster with only one zone.
@anandrkskd This is perfect, can we close this issue now?
@rnapoles-rh yes, we can close this issue.
@rnapoles-rh yes, we can close this issue.
Maybe close it as well?
/remove-triage ready /triage support /close
@dharmit: Closing this issue.
/kind failing-test
What versions of software are you using? Operating System: All supported
Output of odo version: odo v2.2.3 (1579dd5be)
How did you run odo exactly?
Actual behavior
Expected behavior odo push should pass
Any logs, error output, etc?
Acceptance Criteria