Closed auyer closed 2 years ago
Setting the volume directly on the managed node group only works if using the default launch template created by the EKS managed node group service. You can see here regarding how to enable the use of the default template
However, that means you will lose access to set other configurations such as bootstrap_extra_args
which require the use of a custom launch template (module defaults to using custom launch template for this reason, hence why the volume setting has no effect).
To change volume settings you will have to use the block_device_mapping
variable https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L277-L289
I see. I only use the bootstrap_extra_args
for setting node-labels like this:
[settings.kubernetes.node-labels]
"node.kubernetes.io/lifecycle" = "spot"
Is there another way to do this so I dont need to use bootstrap_extra_args
?
the EKS managed node group service does this automatically when using SPOT
In my tests, vpc_security_group_ids
also stopped working. Am I correct to assume that it also needs create_launch_template to be true?
@auyer could you post a full reproduction of your code - its difficult to tell what you have and what you are trying to achieve
@auyer the _diskspace variable only takes effect when _create_launchtemplate is false, and the default for it is true.
I added these blocks in the eks_managed_node_group_defaults to handle the root disk size, some AMIs change the root device_name but you can check the pattern in the AWS doc here.
eks_managed_node_group_defaults = {
block_device_mappings = {
root = {
device_name = "/dev/xvda"
ebs = {
volume_size = 64
}
}
}
instance_types = ["t3.xlarge", "t3a.xlarge", "m6i.xlarge", "m6a.xlarge"]
}
closing this issue out for now - please refer to the examples/
as they show a wide array of usage configurations as well as the documentation which provides a wealth of resources
Hi. I am still having issues with this.
I'm using the same setup in the original question, with create_launch_template=true
. I've added the block_device_mappings
but my issues persist.
This is the default values for my nodes:
eks_managed_node_group_defaults = {
create_launch_template = true
ami_type = "BOTTLEROCKET_x86_64"
platform = "bottlerocket"
disk_size = 128
instance_types = local.instance_type
iam_role_additional_policies = var.iam_role_additional_policies
vpc_security_group_ids = var.additional_security_group_ids
create_security_group = true
min_size = local.min_size
max_size = local.max_size
desired_size = local.desired_size
iam_role_attach_cni_policy = true
attach_cluster_primary_security_group = true
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 30
volume_type = "gp3"
iops = 3000
throughput = 150
delete_on_termination = true
}
}
}
tags = merge(var.default_tags, {
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
})
}
Check out one of the nodes
kubectl describe nodes
~>
Name: [redacted].ec2.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=r5.2xlarge
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=SPOT
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-[redacted].ec2.internal
kubernetes.io/os=linux
...
Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-04dbe8393b8e2ecf6"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 04 May 2022 11:36:57 -0300
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: ip-[redacted].ec2.internal
AcquireTime: <unset>
RenewTime: Wed, 04 May 2022 12:05:42 -0300
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 04 May 2022 12:05:18 -0300 Wed, 04 May 2022 11:36:56 -0300 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 04 May 2022 12:05:18 -0300 Wed, 04 May 2022 12:05:18 -0300 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 04 May 2022 12:05:18 -0300 Wed, 04 May 2022 11:36:56 -0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 04 May 2022 12:05:18 -0300 Wed, 04 May 2022 11:37:27 -0300 KubeletReady kubelet is posting ready status
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 8
ephemeral-storage: 20624592Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 65030940Ki
pods: 58
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 7910m
ephemeral-storage: 17933882132
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 64014108Ki
pods: 58
System Info:
Machine ID: ec246d81b0f4e2cf2ed28ac91ab3f515
System UUID: ec246d81-b0f4-e2cf-2ed2-8ac91ab3f515
Boot ID: 0d655ff9-5720-417f-a3fb-279b6a73ed68
Kernel Version: 5.10.102
OS Image: Bottlerocket OS 1.7.0 (aws-k8s-1.22)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.5.11+bottlerocket
Kubelet Version: v1.22.6-eks-b18cdc9
Kube-Proxy Version: v1.22.6-eks-b18cdc9
ProviderID: aws:///us-east-1a/i-0[redacted]
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system aws-node-9z2vz 25m (0%) 0 (0%) 0 (0%) 0 (0%) 28m
kube-system aws-node-termination-handler-ksklw 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
kube-system ebs-csi-node-ltclq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
kube-system kube-proxy-5qbnq 100m (1%) 0 (0%) 0 (0%) 0 (0%) 28m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 125m (1%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 28m kubelet Starting kubelet.
Warning InvalidDiskCapacity 28m kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 28m (x2 over 28m) kubelet Node ip-[redacted].ec2.internal status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 28m (x2 over 28m) kubelet Node ip-[redacted].ec2.internal status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 28m kubelet Updated Node Allocatable limit across pods
Normal Starting 28m kube-proxy Starting kube-proxy.
Normal NodeReady 28m kubelet Node ip-[redacted].ec2.internal status is now: NodeReady
Normal NodeHasNoDiskPressure 6m49s (x3 over 28m) kubelet Node ip-[redacted].ec2.internal status is now: NodeHasNoDiskPressure
Warning EvictionThresholdMet 37s (x2 over 12m) kubelet Attempting to reclaim ephemeral-storage
Normal NodeHasDiskPressure 28s (x2 over 11m) kubelet Node ip-[redacted].ec2.internal status is now: NodeHasDiskPressure
You're setting a volume size of 30gb and your output is showing attachable-volumes-aws-ebs: 25
- what is the issue?
The issue I refer is DiskPressure. The metrics endpoint tells me I have high fsUsage (and I'm guessing its related). Screenshot from k8s lens app:
Maybe I got it wrong. But earlier today I god lots of evicted pods due to high DiskPressure, and this got me worried.
I added a second disk /dev/xvdb
as a test.
xvdb = {
device_name = "/dev/xvdb"
ebs = {
volume_size = 60
volume_type = "gp3"
iops = 3000
throughput = 150
delete_on_termination = true
}
}
If I get the same issue again, I'll post it here.
Can we reopen this issue? For bottlerocket MNGs, the data volume is /dev/xvdb
, and so the following BDM gets applied to the OS partition (which is read-only, and does not need to be large): https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/aws-eks-managed-node-groups/locals.tf#L60
In particular, this leaves the data volume unencrypted by default, which is probably not intended. It's also small, depending on the workloads.
Changing the above device name to /dev/xvdb
gives the expected result:
$ kubectl describe nodes ...
[...]
Capacity:
ephemeral-storage: 82547144Ki
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Description
Ephemeral storage/node storage are not following configurations in module.
Versions
Module version [Required]: v18.20.1
Terraform version: Terraform v1.1.7 on linux_amd64
Provider version(s): + provider registry.terraform.io/hashicorp/aws v4.8.0 + provider registry.terraform.io/hashicorp/kubernetes v2.10.0
Reproduction Code [Required]
Steps to reproduce the behavior:
Expected behavior
disk_size
is being set in theeks_managed_node_group_defaults
block and in every node_group created. Nodes should have the specified disk.Actual behavior:
Storage does not follow any of the configured values.
Additional context
This is using the bottlerocket AMI. My acctual structure is abstracting this repo a bit, and the whole
eks_managed_node_groups
dict comes from a variable. To reproduce this, create a variable andeks_managed_node_groups = var.node_groups
. But his should not affect the issue.