Closed Darwiner closed 4 years ago
Your paste.centos.org
links will be deleted after 24h after posting, I moved them to unlisted pastebin:
test-cluster.yaml
- https://pastebin.com/9B4aherN
kubectl describe pod test-cluster-us-east-1-us-east-1a-0
- https://pastebin.com/TPVsRBUR
kubectl get pod test-cluster-us-east-1-us-east-1a-0 -o yaml
- https://pastebin.com/fEvf1sWf
Can you also provide details of us-east-1a
node? Maybe there aren't enough taints there.
Please keep them in GH issue not in external links.
Here's a kubectl describe node
on one of the 3 nodes that were brought up into the scylla
instancegroup.
Name: ip-10-10-0-48.ec2.internal
Roles: node
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=c5.large
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1a
kops.k8s.io/instancegroup=scylla
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-10-0-48.ec2.internal
kubernetes.io/os=linux
kubernetes.io/role=node
node-role.kubernetes.io/node=
Annotations: node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.10.0.48/24
projectcalico.org/IPv4IPIPTunnelAddr: 100.122.47.0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 04 Aug 2020 14:37:53 -0400
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 04 Aug 2020 14:38:04 -0400 Tue, 04 Aug 2020 14:38:04 -0400 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 05 Aug 2020 08:50:05 -0400 Tue, 04 Aug 2020 14:37:53 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 05 Aug 2020 08:50:05 -0400 Tue, 04 Aug 2020 14:37:53 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 05 Aug 2020 08:50:05 -0400 Tue, 04 Aug 2020 14:37:53 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 05 Aug 2020 08:50:05 -0400 Tue, 04 Aug 2020 14:38:03 -0400 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.10.0.48
ExternalIP: 54.159.75.66
Hostname: ip-10-10-0-48.ec2.internal
InternalDNS: ip-10-10-0-48.ec2.internal
ExternalDNS: ec2-54-159-75-66.compute-1.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 125753328Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3805088Ki
pods: 110
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 115894266893
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3702688Ki
pods: 110
System Info:
Machine ID: ec2c5ed4075aecd4fc44ed2dcd0fcce2
System UUID: EC2C5ED4-075A-ECD4-FC44-ED2DCD0FCCE2
Boot ID: 772a4ef2-4c94-438a-90ef-e555df733abe
Kernel Version: 4.9.0-13-amd64
OS Image: Debian GNU/Linux 9 (stretch)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.3
Kubelet Version: v1.15.9
Kube-Proxy Version: v1.15.9
PodCIDR: 100.96.28.0/24
ProviderID: aws:///us-east-1a/i-0c76c46574370be51
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-fnqfr 100m (5%) 0 (0%) 0 (0%) 0 (0%) 18h
kube-system k8s-base-kube2iam-n5tgx 50m (2%) 50m (2%) 50Mi (1%) 100Mi (2%) 18h
kube-system kube-proxy-ip-10-10-0-48.ec2.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 18h
sre-monitoring prometheus-operator-prometheus-node-exporter-6zj55 0 (0%) 0 (0%) 0 (0%) 0 (0%) 18h
vlad-rundeck rundeck-6467fccd9d-8rhwd 500m (25%) 1 (50%) 512Mi (14%) 1Gi (28%) 25m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (37%) 1050m (52%)
memory 562Mi (15%) 1124Mi (31%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events: <none>
PS. Someone might want to change the bug report template, as it mentions using a pastebin for config files.
PS. Someone might want to change the bug report template, as it mentions using a pastebin for config files.
Done
@Darwiner Your Scylla Cluster requires at least 2 CPUs (resources.requests), and scylla
instancegroup is c5.large
which has 2 CPUs. Nodes are being used by other pods too:
Resource Requests Limits
-------- -------- ------
cpu 750m (37%) 1050m (52%)
memory 562Mi (15%) 1124Mi (31%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
You can try either upgrading nodes to bigger tier, or change limit resources of Scylla Cluster to some lower values.
@zimnx Ugh, thanks. That seems to have been the issue, lack of allocatable ressources...
I've now made the 3 nodes that I created via the "scylla" instancegroup to be c5.xlarge
instead, and also added a taint of dedicated=scylla:NoSchedule
to the nodes in addition to a toleration in the cluster definition. That should at least take care of blocking the scheduler from starting pods onto these nodes unless they have tolerations in place.
Cluster is up and running now.
Thanks!
Describe the bug After defining the cluster to be created, the pods do not seem to be able to match with the specified instancegroup and do not start, as they do not seem to be able to find the target nodes.
Each of the created nodes in the scylla instancegroup in question are not running anything else and have enough ressources (cpu/mem) available.
To Reproduce
kubectl create -f examples/generic/test-cluster.yaml
Expected behavior A pod should be running on 3 different nodes, each node being located in the same region, on a different AZ.
Config Files
test-cluster.yaml
https://paste.centos.org/view/ff600bf8Logs
https://paste.centos.org/view/1964aebd
https://paste.centos.org/view/5a53c45e
Environment: