trm109 / amazon-eks-ami-sysctl-tests

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
0 stars 0 forks source link

Testing! #6

Open trm109 opened 3 weeks ago

trm109 commented 3 weeks ago

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

trm109 commented 3 weeks ago

/summarize awslabs amazon-eks-ami 1145

github-actions[bot] commented 3 weeks ago

Here's a summary of the GitHub issue reply chain:

Issue: Kubernetes nodes on EKS clusters become unresponsive and lose readiness when applications consume too much memory, causing extended downtime. This doesn't occur on GKE clusters.

Key points:

  1. The problem is traced to insufficient memory reservation for kubelet, set by the EKS bootstrap script.
  2. Current EKS memory reservation (574Mi) is much lower than GKE's (1.8Gi).
  3. Increasing kubelet's reserved memory to 1Gi or 1.8Gi resolves the issue.
  4. The problem affects various instance types, including t3.medium and m5.8xlarge.
  5. It occurs with both containerd and docker-based AMIs.

Proposed solutions and workarounds:

  1. Revise kubeReserved values, particularly for memory.
  2. Consider reverting to GKE values as a temporary fix.
  3. Implement custom node logic limiting pods and using GKE memory calculation.
  4. Ensure correct configuration of pod memory limits and requests.
  5. Explore per-pod kube reserved calculation.
  6. Consider implementing a runtime class with pod overheads.

Conclusion: While there's no official fix yet, AWS acknowledges the need to revise kubeReserved values. Users are advised to implement workarounds such as increasing reserved memory, properly configuring pod resources, and considering custom node logic. The issue remains open, with ongoing discussions about the best approach to resolve it permanently.

trm109 commented 3 weeks ago

/summarize awslabs amazon-eks-ami 990

github-actions[bot] commented 3 weeks ago

Here's a summary of the GitHub issue discussion:

Issue: Nodes occasionally fail to boot and get stuck in NotReady state.

Key details:

Likely root cause:

Potential factors:

Proposed solutions:

Next steps:

The issue remains open pending further investigation and fixes.

trm109 commented 2 weeks ago

/summarize abc=1 xyz 120

trm109 commented 2 weeks ago

/summarize abc=1 xyz 120

trm109 commented 2 weeks ago

/summarize +owner testA +repo testB +issue 10

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1

trm109 commented 2 weeks ago

/summarize awslabs aws-shell 1