Closed dmcnaught closed 3 years ago
Instancegroups: Masters:
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2017-11-03T00:03:39Z"
generation: 7
labels:
kops.k8s.io/cluster: <redacted>
name: master-us-east-1b
spec:
image: kope.io/k8s-1.17-debian-stretch-amd64-hvm-ebs-2020-01-17
machineType: c3.xlarge
maxSize: 1
minSize: 1
role: Master
subnets:
- us-east-1b
Nodes:
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-01-07T22:32:30Z"
generation: 10
labels:
kops.k8s.io/cluster: <redacted>
name: c4xlarge-wellpass
spec:
image: kope.io/k8s-1.17-debian-stretch-amd64-hvm-ebs-2020-01-17
machineType: c4.xlarge
maxSize: 30
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: c4xlarge-wellpass
wellpass: "true"
role: Node
subnets:
- us-east-1b
- us-east-1c
- us-east-1d
- us-east-1e
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
Do still experience this problem?
Adding more memory to the masters seems to be working, thanks @olemarkus
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-contributor-experience at kubernetes/community. /close
@fejta-bot: Closing this issue.
1. What
kops
version are you running? The commandkops version
, will display this information. 1.17.2 2. What Kubernetes version are you running?kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag. 1.17.13 3. What cloud provider are you using? AWS 4. What commands did you run? What is the simplest way to reproduce this issue? It has been happening for a month or so. Currently cluster works for a week before getting the error and becoming unavailable. I've been able to get the cluster back up by running:Thread in slack: https://kubernetes.slack.com/archives/C3QUFP0QM/p1603730470066300 5. What happened after the commands executed? N/A 6. What did you expect to happen? N/A 7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know?