kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.93k stars 4.65k forks source link

Coreos instances on AWS not responding #5780

Closed oded-dd closed 6 years ago

oded-dd commented 6 years ago

We are using coreos 1800.7.0 on AWS using kops.

We have at least 5 instances every day which stop responding (instance is frozen, not answering to SSH) causing the containers running on the instance to fail with status ContainerCreating until we are stopping the instance.

Our kops version is 1.10.3 And the instances are AWS 5 series (m5, r5, c5)

rauno56 commented 6 years ago

Are you positive that those instances are not running out of memory? The symptoms sound the same.

We've been using CoreOS images for couple of weeks now without problems.

tavisma commented 6 years ago

We are also using CoreOS without issue (1800.7.0 and now 1855.4.0) 1.10.7 kube though

oded-dd commented 6 years ago

We were experiencing the issue with 1800.7.0 though after upgrading to 1855.4.0 the issue seems to be resolved. We don't have an OOM issue as the nodes are being monitored and no issue of that kind caused an alert.

On Wed, Sep 19, 2018, 02:36 Tavis notifications@github.com wrote:

We are also using CoreOS without issue (1800.7.0 and now 1855.4.0) 1.10.7 kube though

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kops/issues/5780#issuecomment-422594187, or mute the thread https://github.com/notifications/unsubscribe-auth/AZnshhtYxU3gsY-Zd3RjG5JtBbJPwBb_ks5ucYN6gaJpZM4WpEXF .