Closed paalkr closed 5 years ago
@paalkr Hi, thanks for trying kube-aws!
AFAIK, these pods are scheduled to worker nodes by default in Kubernetes. I partly agree to you though - actually, I'm running tiller on controller nodes.
However, as of today, I personally believe that running pods like kube-dns on controller nodes wouldn't be a good idea.
Controller nodes can't be auto-scaled easily due to the --apiserver-count
param required by apiserver. If you've scaled out your worker nodes considerably, kube-dns auto-scaled by kube-dns-autoscaler would easily outgrow controller nodes. #499 is a related issue about the --apiserver-count
param.
Also, you can definitely run any "system" pod on controller nodes by adding appropriate tolerations to tolerate taints associated only to controller nodes.
In k8s 1.6, the toleration look like: https://github.com/kubernetes-incubator/kube-aws/blob/master/core/controlplane/config/templates/cloud-config-controller#L713-L717
In k8s 1.5, it is in the annotations
field instead.
Hi
Thanks for the feedback. I see that scaling the dns service beyond the number of controller plane nodes can be necessary when scaling out a cluster to a certain level. My main concern is heapster, dashboard and reschduler that by default only runs in one pod and ar not scaled horizontally.
I will try to modify the controller user data adding the toleration for heapster and see how that work. I guess this will do the trick?
- path: /srv/kubernetes/manifests/heapster-de.yaml
content: |
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.3.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
version: v1.3.0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: heapster
version: v1.3.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.3.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
containers:
- image: gcr.io/google_containers/heapster:v1.3.0
name: heapster
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
resources:
limits:
cpu: 80m
memory: 200Mi
requests:
cpu: 80m
memory: 200Mi
command:
- /heapster
- --source=kubernetes.summary_api:''
- image: gcr.io/google_containers/addon-resizer:1.6
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=4m
- --memory=200Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster-v1.3.0
- --container=heapster
- --poll-period=300000
- --estimator=exponential
@paalkr LGTM ๐
Let me also add that, you should ensure system pods scheduled to conroller nodes to have:
So that a rolling-update of controller nodes won't take down your system services.
Thanks, good point. I'll add anti-afinity to the config as well.
Will heapster, dashboard and rescheduler run nicely when scaled to two pods each, or will they make trouble for each other?
When I originally added the rescheduler it didn't play nice with more than one. That may be fixed now we unblocked port 443 between controllers. I can align it to the scheduler as part of the existing rescheduler issue.
Thanks
@mumoshu , even without anti-affinity and two replicas a rolling update for the controller nodes wont be any worse to the system than the current situation? Like if the worker node that happens runs heapster, rescheduler or dashboard is terminated by AWS. Kubernates will just move the pods to the controller node not being updated, right?
Will this anit-affinity and scaling setting work for heapster?
EDIT: fixed typo, added missing spec:affinity: EDIT2: moved affinity into correct location, spec : template : spec : affinity
- path: /srv/kubernetes/manifests/heapster-de.yaml
content: |
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.3.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
version: v1.3.0
spec:
replicas: 2
selector:
matchLabels:
k8s-app: heapster
version: v1.3.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.3.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- heapster
topologyKey: kubernetes.io/hostname
containers:
- image: gcr.io/google_containers/heapster:v1.3.0
name: heapster
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
resources:
limits:
cpu: 80m
memory: 200Mi
requests:
cpu: 80m
memory: 200Mi
command:
- /heapster
- --source=kubernetes.summary_api:''
- image: gcr.io/google_containers/addon-resizer:1.6
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=4m
- --memory=200Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster-v1.3.0
- --container=heapster
- --poll-period=300000
- --estimator=exponential
@paalkr Sorry my explanation was'nt complete.
I see. I tried to just do an in place update of the heapster deployment by
kubectl replace -n kube-systen -f heapster-deployment.yaml
But the heapster pods (now two of them) does still run on the worker nodes
content of heapster-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.3.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
version: v1.3.0
spec:
replicas: 2
selector:
matchLabels:
k8s-app: heapster
version: v1.3.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.3.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- heapster
topologyKey: kubernetes.io/hostname
containers:
- image: gcr.io/google_containers/heapster:v1.3.0
name: heapster
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
resources:
limits:
cpu: 80m
memory: 200Mi
requests:
cpu: 80m
memory: 200Mi
command:
- /heapster
- --source=kubernetes.summary_api:''
- image: gcr.io/google_containers/addon-resizer:1.6
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=4m
- --memory=200Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster-v1.3.0
- --container=heapster
- --poll-period=300000
- --estimator=exponential
Sorry, didn't mean to close the issue ;) I just hit the wrong comment button.
@mumoshu , adding the proper toleration to the heapster pod spec will only allow the pod to run on the controller, but not force the pod to run only on the controller. For this I have to add a nodeAffinity to attract the pod to any of the controller nodes.
I really can't get this right...
I thought I've had created the correct affinity and tolerations now, but the heapster pods always launches on the worker nodes. How can I debug this further?
My heapster depolyment definition
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.3.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
version: v1.3.0
spec:
replicas: 2
# selector:
# matchLabels:
# k8s-app: heapster
# version: v1.3.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.3.0
# annotations:
# scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- heapster
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kube-aws.coreos.com/role
operator: NotIn
values:
- worker
containers:
- image: gcr.io/google_containers/heapster:v1.3.0
name: heapster
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
resources:
limits:
cpu: 80m
memory: 200Mi
requests:
cpu: 80m
memory: 200Mi
command:
- /heapster
- --source=kubernetes.summary_api:''
- image: gcr.io/google_containers/addon-resizer:1.6
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=4m
- --memory=200Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster-v1.3.0
- --container=heapster
- --poll-period=300000
- --estimator=exponential
A controller node spec
- apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2017-04-19T22:36:35Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t2.medium
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: eu-west-1
failure-domain.beta.kubernetes.io/zone: eu-west-1b
kubernetes.io/hostname: ip-10-1-44-191.eu-west-1.compute.internal
name: ip-10-1-44-191.eu-west-1.compute.internal
namespace: ""
resourceVersion: "126535"
selfLink: /api/v1/nodesip-10-1-44-191.eu-west-1.compute.internal
uid: a4cdc10a-2550-11e7-a0cd-02d5584ffffb
spec:
externalID: i-02756bb5346d3299d
providerID: aws:///eu-west-1b/i-02756bb5346d3299d
taints:
- effect: NoSchedule
key: node.alpha.kubernetes.io/role
timeAdded: null
value: master
status:
addresses:
- address: 10.1.44.191
type: InternalIP
- address: 10.1.44.191
type: LegacyHostIP
- address: ip-10-1-44-191.eu-west-1.compute.internal
type: InternalDNS
- address: ip-10-1-44-191.eu-west-1.compute.internal
type: Hostname
allocatable:
cpu: "2"
memory: 3947136Ki
pods: "110"
capacity:
cpu: "2"
memory: 4049536Ki
pods: "110"
conditions:
- lastHeartbeatTime: 2017-04-20T18:10:11Z
lastTransitionTime: 2017-04-19T22:36:35Z
message: kubelet has sufficient disk space available
reason: KubeletHasSufficientDisk
status: "False"
type: OutOfDisk
- lastHeartbeatTime: 2017-04-20T18:10:11Z
lastTransitionTime: 2017-04-19T22:36:35Z
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: 2017-04-20T18:10:11Z
lastTransitionTime: 2017-04-19T22:36:35Z
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: 2017-04-20T18:10:11Z
lastTransitionTime: 2017-04-19T22:36:35Z
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5
- quay.io/coreos/hyperkube:v1.6.1_coreos.0
sizeBytes: 664861472
- names:
- gcr.io/google_containers/heapster@sha256:3dff9b2425a196aa51df0cebde0f8b427388425ba84568721acf416fa003cd5c
- gcr.io/google_containers/heapster:v1.3.0
sizeBytes: 68105973
- names:
- gcr.io/google_containers/addon-resizer@sha256:ba506f5f21356331d92141ee48fc4945fd467ec6010364ae970342de5477272c
- gcr.io/google_containers/addon-resizer:1.6
sizeBytes: 48784610
- names:
- gcr.io/google_containers/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
- gcr.io/google_containers/pause-amd64:3.0
sizeBytes: 746888
nodeInfo:
architecture: amd64
bootID: e66ecfc6-6231-41c0-9f5a-320feda7f400
containerRuntimeVersion: docker://1.12.6
kernelVersion: 4.9.16-coreos-r1
kubeProxyVersion: v1.6.1+coreos.0
kubeletVersion: v1.6.1+coreos.0
machineID: 8e025a21a4254e11b028584d9d8b12c4
operatingSystem: linux
osImage: Container Linux by CoreOS 1298.7.0 (Ladybug)
systemUUID: EC238E36-080F-BAFB-608E-8C11B6B2F37E
- apiVersion: v1
kind: Node
metadata:
annotations:
kube-aws.coreos.com/securitygroups: k8sprod-prerequisites-WorkerSecurityGroup-6O3GXX71Z193,k8sprod-Controlplane-1RJHQ7DSHBPTR-SecurityGroupWorker-FAESBHBU1F3T
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2017-04-20T10:32:34Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t2.large
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: eu-west-1
failure-domain.beta.kubernetes.io/zone: eu-west-1b
kube-aws.coreos.com/autoscalinggroup: k8sprod-T2largeB-1ANR4S4CPBPZ7-Workers-11MVW8HICN4JB
kube-aws.coreos.com/launchconfiguration: k8sprod-T2largeB-1ANR4S4CPBPZ7-WorkersLC-169AP7153B6DN
kube-aws.coreos.com/role: worker
kubernetes.io/hostname: ip-10-1-44-236.eu-west-1.compute.internal
name: ip-10-1-44-236.eu-west-1.compute.internal
namespace: ""
resourceVersion: "126539"
selfLink: /api/v1/nodesip-10-1-44-236.eu-west-1.compute.internal
uid: aaa61ee8-25b4-11e7-9b85-06433a3e9fe9
spec:
externalID: i-0c166cdaa4e7002e7
providerID: aws:///eu-west-1b/i-0c166cdaa4e7002e7
status:
addresses:
- address: 10.1.44.236
type: InternalIP
- address: 10.1.44.236
type: LegacyHostIP
- address: ip-10-1-44-236.eu-west-1.compute.internal
type: InternalDNS
- address: ip-10-1-44-236.eu-west-1.compute.internal
type: Hostname
allocatable:
cpu: "2"
memory: 8075900Ki
pods: "110"
capacity:
cpu: "2"
memory: 8178300Ki
pods: "110"
conditions:
- lastHeartbeatTime: 2017-04-20T18:10:13Z
lastTransitionTime: 2017-04-20T10:32:34Z
message: kubelet has sufficient disk space available
reason: KubeletHasSufficientDisk
status: "False"
type: OutOfDisk
- lastHeartbeatTime: 2017-04-20T18:10:13Z
lastTransitionTime: 2017-04-20T10:32:34Z
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: 2017-04-20T18:10:13Z
lastTransitionTime: 2017-04-20T10:32:34Z
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: 2017-04-20T18:10:13Z
lastTransitionTime: 2017-04-20T10:32:44Z
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- xxx.dkr.ecr.eu-west-1.amazonaws.com/ags/105@sha256:20f391bc99458c7bf926f2ecee5bda3db34f0781e45f2969730bd3bacf74cad2
- 893008332793.dkr.ecr.eu-west-1.amazonaws.com/ags/105:GeomapAdmin_1.0.1
sizeBytes: 7265879019
- names:
- quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5
- quay.io/coreos/hyperkube:v1.6.1_coreos.0
sizeBytes: 664861472
- names:
- gcr.io/google_containers/echoserver@sha256:5d99aa1120524c801bc8c1a7077e8f5ec122ba16b6dda1a5d3826057f67b9bcb
- gcr.io/google_containers/echoserver:1.4
sizeBytes: 140366210
- names:
- quay.io/coreos/awscli@sha256:712772e2329b24c203462a72f967a330621d2024b5a5a3545b0bb46dc12efd16
- quay.io/coreos/awscli:master
sizeBytes: 97498295
- names:
- gcr.io/google_containers/heapster@sha256:3dff9b2425a196aa51df0cebde0f8b427388425ba84568721acf416fa003cd5c
- gcr.io/google_containers/heapster:v1.3.0
sizeBytes: 68105973
- names:
- gcr.io/google_containers/addon-resizer@sha256:ba506f5f21356331d92141ee48fc4945fd467ec6010364ae970342de5477272c
- gcr.io/google_containers/addon-resizer:1.6
sizeBytes: 48784610
- names:
- gcr.io/google_containers/cluster-proportional-autoscaler-amd64@sha256:5a3bdd25a5b0f7f8f285e8ff8f4402cf86ddfdfa537e9f053c77c5f043821f70
- gcr.io/google_containers/cluster-proportional-autoscaler-amd64:1.0.0
sizeBytes: 48155586
- names:
- gcr.io/google_containers/defaultbackend@sha256:ee3aa1187023d0197e3277833f19d9ef7df26cee805fef32663e06c7412239f9
- gcr.io/google_containers/defaultbackend:1.0
sizeBytes: 7510068
- names:
- gcr.io/google_containers/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
- gcr.io/google_containers/pause-amd64:3.0
sizeBytes: 746888
nodeInfo:
architecture: amd64
bootID: ab00ea02-217d-47b4-8f78-96f98a937717
containerRuntimeVersion: docker://1.12.6
kernelVersion: 4.9.16-coreos-r1
kubeProxyVersion: v1.6.1+coreos.0
kubeletVersion: v1.6.1+coreos.0
machineID: 8e025a21a4254e11b028584d9d8b12c4
operatingSystem: linux
osImage: Container Linux by CoreOS 1298.7.0 (Ladybug)
systemUUID: EC21F8F9-0AE4-8B0C-425B-5C910B0A5CBB
and a worker node spec
- apiVersion: v1
kind: Node
metadata:
annotations:
kube-aws.coreos.com/securitygroups: k8sprod-prerequisites-WorkerSecurityGroup-6O3GXX71Z193,k8sprod-Controlplane-1RJHQ7DSHBPTR-SecurityGroupWorker-FAESBHBU1F3T
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: 2017-04-20T08:21:16Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: t2.large
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: eu-west-1
failure-domain.beta.kubernetes.io/zone: eu-west-1c
kube-aws.coreos.com/autoscalinggroup: k8sprod-T2largeC-17Z2CTVC1TJGE-Workers-4H8RK5JLPXND
kube-aws.coreos.com/launchconfiguration: k8sprod-T2largeC-17Z2CTVC1TJGE-WorkersLC-1F7XCJ5EQBU7X
kube-aws.coreos.com/role: worker
kubernetes.io/hostname: ip-10-1-45-239.eu-west-1.compute.internal
name: ip-10-1-45-239.eu-west-1.compute.internal
namespace: ""
resourceVersion: "126528"
selfLink: /api/v1/nodesip-10-1-45-239.eu-west-1.compute.internal
uid: 531103c6-25a2-11e7-a0cd-02d5584ffffb
spec:
externalID: i-0aa0e5fad1ec73293
providerID: aws:///eu-west-1c/i-0aa0e5fad1ec73293
status:
addresses:
- address: 10.1.45.239
type: InternalIP
- address: 10.1.45.239
type: LegacyHostIP
- address: ip-10-1-45-239.eu-west-1.compute.internal
type: InternalDNS
- address: ip-10-1-45-239.eu-west-1.compute.internal
type: Hostname
allocatable:
cpu: "2"
memory: 8075900Ki
pods: "110"
capacity:
cpu: "2"
memory: 8178300Ki
pods: "110"
conditions:
- lastHeartbeatTime: 2017-04-20T18:10:06Z
lastTransitionTime: 2017-04-20T08:21:16Z
message: kubelet has sufficient disk space available
reason: KubeletHasSufficientDisk
status: "False"
type: OutOfDisk
- lastHeartbeatTime: 2017-04-20T18:10:06Z
lastTransitionTime: 2017-04-20T08:21:16Z
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: 2017-04-20T18:10:06Z
lastTransitionTime: 2017-04-20T08:21:16Z
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: 2017-04-20T18:10:06Z
lastTransitionTime: 2017-04-20T08:21:26Z
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- xxx.dkr.ecr.eu-west-1.amazonaws.com/ags/105@sha256:20f391bc99458c7bf926f2ecee5bda3db34f0781e45f2969730bd3bacf74cad2
- 893008332793.dkr.ecr.eu-west-1.amazonaws.com/ags/105:GeomapAdmin_1.0.1
sizeBytes: 7265879019
- names:
- quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5
- quay.io/coreos/hyperkube:v1.6.1_coreos.0
sizeBytes: 664861472
- names:
- gcr.io/google_containers/nginx-ingress-controller@sha256:995427304f514ac1b70b2c74ee3c6d4d4ea687fb2dc63a1816be15e41cf0e063
- gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.3
sizeBytes: 121204435
- names:
- quay.io/coreos/awscli@sha256:712772e2329b24c203462a72f967a330621d2024b5a5a3545b0bb46dc12efd16
- quay.io/coreos/awscli:master
sizeBytes: 97498295
- names:
- gcr.io/google_containers/heapster@sha256:3dff9b2425a196aa51df0cebde0f8b427388425ba84568721acf416fa003cd5c
- gcr.io/google_containers/heapster:v1.3.0
sizeBytes: 68105973
- names:
- gcr.io/google_containers/addon-resizer@sha256:ba506f5f21356331d92141ee48fc4945fd467ec6010364ae970342de5477272c
- gcr.io/google_containers/addon-resizer:1.6
sizeBytes: 48784610
- names:
- gcr.io/google_containers/kubedns-amd64@sha256:3d3d67f519300af646e00adcf860b2f380d35ed4364e550d74002dadace20ead
- gcr.io/google_containers/kubedns-amd64:1.9
sizeBytes: 46998769
- names:
- gcr.io/google_containers/dnsmasq-metrics-amd64@sha256:4063e37fd9b2fd91b7cc5392ed32b30b9c8162c4c7ad2787624306fc133e80a9
- gcr.io/google_containers/dnsmasq-metrics-amd64:1.0
sizeBytes: 13998769
- names:
- gcr.io/google_containers/exechealthz-amd64@sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
- gcr.io/google_containers/exechealthz-amd64:1.2
sizeBytes: 8374840
- names:
- gcr.io/google_containers/defaultbackend@sha256:ee3aa1187023d0197e3277833f19d9ef7df26cee805fef32663e06c7412239f9
- gcr.io/google_containers/defaultbackend:1.0
sizeBytes: 7510068
- names:
- gcr.io/google_containers/kube-dnsmasq-amd64@sha256:a722df15c0cf87779aad8ba2468cf072dd208cb5d7cfcaedd90e66b3da9ea9d2
- gcr.io/google_containers/kube-dnsmasq-amd64:1.4
sizeBytes: 5126001
- names:
- gcr.io/google_containers/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
- gcr.io/google_containers/pause-amd64:3.0
sizeBytes: 746888
nodeInfo:
architecture: amd64
bootID: f816b350-c0f5-40b1-a836-e011c02b2f78
containerRuntimeVersion: docker://1.12.6
kernelVersion: 4.9.16-coreos-r1
kubeProxyVersion: v1.6.1+coreos.0
kubeletVersion: v1.6.1+coreos.0
machineID: 8e025a21a4254e11b028584d9d8b12c4
operatingSystem: linux
osImage: Container Linux by CoreOS 1298.7.0 (Ladybug)
systemUUID: EC2905EE-B013-15FA-984A-C828014882A1
Hmm, sorry for spamming with questions and comments
To me it seems like my deployment is ignoring all my tolerations and affinity configuration. I tried to use nodeAffinity to force heapster to run on a specific worker, but still a random worker is picket by the scheduler
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
# - key: kube-aws.coreos.com/role
# operator: NotIn
# values:
# - worker
- key: kubernetes.io/hostname
operator: In
values:
- ip-10-1-45-239.eu-west-1.compute.internal
Using the good old nodeSelector does work though, and heapster ends up on the specified node.
nodeSelector:
kubernetes.io/hostname: ip-10-1-45-239.eu-west-1.compute.internalal
But if I try to use nodeSelector to force the heapster pod to a controller node the scheduler complains about the taint not being accepted. Even though I have added this tolerations to my deployment file
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: 'node.alpha.kubernetes.io/role'
operator: Equal
value: master
effect: NoSchedule
Quick comment after looking at your above example yaml - shouldn't nodeAffinity in pod spec rather than deployment spec?(does it even pass the validation when in deployment spec? 2017ๅนด4ๆ21ๆฅ(้) 7:08 paalkr notifications@github.com:
Hmm, sorry for spamming with questions and comments
To me it seems like my deployment is ignoring all my tolerations and affinity configuration. I tried to use nodeAffinity to force heapster to run on a specific worker, but still a random worker is picket by the scheduler
nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: # - key: kube-aws.coreos.com/role # operator: NotIn # values: # - worker - key: kubernetes.io/hostname operator: In values: - ip-10-1-45-239.eu-west-1.compute.internal
Using the good old nodeSelector does work though, and heapster ends up on the specified node.
nodeSelector: kubernetes.io/hostname: ip-10-1-45-239.eu-west-1.compute.internalal
But if I try to use nodeSelector to force the heapster pod to a controller node the scheduler complains about the taint not being accepted. Even though I have added this tolerations to my deployment file
tolerations: - key: CriticalAddonsOnly operator: Exists - key: 'node.alpha.kubernetes.io/role' operator: Equal value: master effect: NoSchedule
โ You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/566#issuecomment-295937194, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-RyB5nKxY6Chb3Ao5mGCoFDi9xmzks5rx9dMgaJpZM4NC3c4 .
@mumoshu , my first example had this wrong and it was correctly not passing validation. But I thought I got this right in the latest configuration i posted. I put it in here as well just for reference.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.3.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
version: v1.3.0
spec:
replicas: 2
# selector:
# matchLabels:
# k8s-app: heapster
# version: v1.3.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.3.0
# annotations:
# scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- key: "node.alpha.kubernetes.io/role"
operator: "Equal"
value: "master"
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- heapster
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kube-aws.coreos.com/role
operator: NotIn
values:
- worker
......
@mumoshu , Any suggestions why the tolerations and affinity settings are not respected?
@paalkr AFAICS, kube-aws worker nodes aren't labeled with kube-aws.coreos.com/role
by default.
Could you ensure that they're explicitly labeled with appropriate config in cluster.yaml like:
worker:
nodePools:
- name: pool1
nodeLabels:
kube-aws.coreos.com/role: worker
?
Hi
My cluster.yaml file is modified to properly tag workers, according to your suggestion. So the tag are in place. But still the nodeAffinity does not function as expected. The pods are attracted to the workers, not the controllers.
@paalkr Hi! Could you share me the result of kubectl describe node <one of your worker node>
and kubectl desribe node <one of your controller node>
?
@mumoshu , thanks for taking time to help out. Highly appreciated!
Please find the output below
Controller:
Name: ip-10-1-43-150.eu-west-1.compute.internal
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1a
kubernetes.io/hostname=ip-10-1-43-150.eu-west-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node.alpha.kubernetes.io/role=master:NoSchedule
CreationTimestamp: Thu, 20 Apr 2017 00:36:57 +0200
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Tue, 25 Apr 2017 08:45:20 +0200 Thu, 20 Apr 2017 00:36:57 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 25 Apr 2017 08:45:20 +0200 Thu, 20 Apr 2017 00:36:57 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 25 Apr 2017 08:45:20 +0200 Thu, 20 Apr 2017 00:36:57 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Tue, 25 Apr 2017 08:45:20 +0200 Thu, 20 Apr 2017 00:36:57 +0200 KubeletReady kubelet is posting ready status
Addresses: 10.1.43.150,10.1.43.150,ip-10-1-43-150.eu-west-1.compute.internal,ip-10-1-43-150.eu-west-1.compute.internal
Capacity:
cpu: 2
memory: 4049536Ki
pods: 110
Allocatable:
cpu: 2
memory: 3947136Ki
pods: 110
System Info:
Machine ID: 8e025a21a4254e11b028584d9d8b12c4
System UUID: EC235DB6-B862-7550-8FF8-BB7847BEBAF8
Boot ID: d739a8fb-6f0b-4e1d-baa2-e2a7db373e22
Kernel Version: 4.9.16-coreos-r1
OS Image: Container Linux by CoreOS 1298.7.0 (Ladybug)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1+coreos.0
Kube-Proxy Version: v1.6.1+coreos.0
ExternalID: i-0c48af722c785f04b
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system kube-apiserver-ip-10-1-43-150.eu-west-1.compute.internal 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-ip-10-1-43-150.eu-west-1.compute.internal 200m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-ip-10-1-43-150.eu-west-1.compute.internal 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-ip-10-1-43-150.eu-west-1.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
300m (15%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
Worker:
Name: ip-10-1-43-42.eu-west-1.compute.internal
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.large
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1a
kube-aws.coreos.com/autoscalinggroup=k8sprod-T2largeA-JQVZA9S334BT-Workers-VD3FAJXLV81K
kube-aws.coreos.com/launchconfiguration=k8sprod-T2largeA-JQVZA9S334BT-WorkersLC-41PSQQY31I4Y
kube-aws.coreos.com/role=worker
kubernetes.io/hostname=ip-10-1-43-42.eu-west-1.compute.internal
Annotations: kube-aws.coreos.com/securitygroups=k8sprod-prerequisites-WorkerSecurityGroup-6O3GXX71Z193,k8sprod-Controlplane-1RJHQ7DSHBPTR-SecurityGroupWorker-FAESBHBU1F3T
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Thu, 20 Apr 2017 11:18:23 +0200
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Tue, 25 Apr 2017 08:47:35 +0200 Sat, 22 Apr 2017 06:31:01 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 25 Apr 2017 08:47:35 +0200 Sat, 22 Apr 2017 06:31:01 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 25 Apr 2017 08:47:35 +0200 Sat, 22 Apr 2017 06:31:01 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Tue, 25 Apr 2017 08:47:35 +0200 Sat, 22 Apr 2017 06:31:01 +0200 KubeletReady kubelet is posting ready status
Addresses: 10.1.43.42,10.1.43.42,ip-10-1-43-42.eu-west-1.compute.internal,ip-10-1-43-42.eu-west-1.compute.internal
Capacity:
cpu: 2
memory: 8178300Ki
pods: 110
Allocatable:
cpu: 2
memory: 8075900Ki
pods: 110
System Info:
Machine ID: 8e025a21a4254e11b028584d9d8b12c4
System UUID: EC200572-41D5-2244-3240-0FE68293A76F
Boot ID: 2d7db8bd-2f04-472b-b46d-a2c0d5f68c5b
Kernel Version: 4.9.16-coreos-r1
OS Image: Container Linux by CoreOS 1298.7.0 (Ladybug)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1+coreos.0
Kube-Proxy Version: v1.6.1+coreos.0
ExternalID: i-024daef28f2c111dc
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default geomapadmin-468890941-xz00f 100m (5%) 500m (25%) 350Mi (4%) 0 (0%)
default ingress-nginx-2346665006-0463j 50m (2%) 0 (0%) 0 (0%) 0 (0%)
default nginx-default-backend-2003809344-2184w 10m (0%) 100m (5%) 20Mi (0%) 50Mi (0%)
kube-system heapster-v1.3.0-504010935-3jvsb 194m (9%) 194m (9%) 354Mi (4%) 354Mi (4%)
kube-system kube-dns-3816048056-5tpmj 260m (13%) 0 (0%) 140Mi (1%) 220Mi (2%)
kube-system kube-proxy-ip-10-1-43-42.eu-west-1.compute.internal 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-rescheduler-3155147949-0rm2p 10m (0%) 0 (0%) 100Mi (1%) 0 (0%)
kube-system kubernetes-dashboard-v1.5.1-lpnbb 100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
724m (36%) 894m (44%) 1014Mi (12%) 674Mi (8%)
Events: <none>
@paalkr Sorry for being late in replying and thanks for the info!
Your configuration seems good. Then, have you by any chance modified node labels after the pods had ben scheduled?
Could you kubectl delete
those problematic pods anyway and then observe if they still get re-scheduled to the worker nodes?
If deleting pods doesn't fix your case, could you try to tag controller nodes and give pods node affinities to controller nodes, instead of node anti-affinities to worker nodes?
FYI, experimental.nodeLabels
can be used to label only controller nodes.
Thanks @mumoshu . Yes, I noticed the experimental.nodeLabels
feature, and applied that wile updating kube-aws to the latest RC. I also deleted the out-of-the-box heapster deployemnt (and by that all Heapster pods), and redeployed Heapster. But still no luck.
Output from one of the controller nodes
Name: ip-10-1-45-70.eu-west-1.compute.internal
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1c
kube-aws.coreos.com/role=controller
kubernetes.io/hostname=ip-10-1-45-70.eu-west-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node.alpha.kubernetes.io/role=master:NoSchedule
CreationTimestamp: Fri, 28 Apr 2017 10:02:13 +0200
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 30 Apr 2017 18:29:04 +0200 Fri, 28 Apr 2017 10:02:13 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 30 Apr 2017 18:29:04 +0200 Fri, 28 Apr 2017 10:02:13 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 30 Apr 2017 18:29:04 +0200 Fri, 28 Apr 2017 10:02:13 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Sun, 30 Apr 2017 18:29:04 +0200 Fri, 28 Apr 2017 10:02:13 +0200 KubeletReady kubelet is posting ready status
Addresses: 10.1.45.70,10.1.45.70,ip-10-1-45-70.eu-west-1.compute.internal,ip-10-1-45-70.eu-west-1.compute.internal
Capacity:
cpu: 2
memory: 4049512Ki
pods: 110
Allocatable:
cpu: 2
memory: 3947112Ki
pods: 110
System Info:
Machine ID: 8e025a21a4254e11b028584d9d8b12c4
System UUID: EC21EB49-60D8-2923-FFA2-D2D16B6A97A6
Boot ID: a91ab878-68a4-4a44-ac50-34c7d3211920
Kernel Version: 4.9.24-coreos
OS Image: Container Linux by CoreOS 1353.7.0 (Ladybug)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.2+coreos.0
Kube-Proxy Version: v1.6.2+coreos.0
ExternalID: i-08934667737e24263
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system kube-apiserver-ip-10-1-45-70.eu-west-1.compute.internal 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-ip-10-1-45-70.eu-west-1.compute.internal 200m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-ip-10-1-45-70.eu-west-1.compute.internal 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-ip-10-1-45-70.eu-west-1.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
300m (15%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
Heapster deployment definition
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.3.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
version: v1.3.0
spec:
replicas: 3
# selector:
# matchLabels:
# k8s-app: heapster
# version: v1.3.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.3.0
# annotations:
# scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
# nodeSelector:
# # kube-aws.coreos.com/role: controller
# kubernetes.io/hostname: ip-10-1-45-239.eu-west-1.compute.internalal
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: 'node.alpha.kubernetes.io/role'
operator: Equal
value: master
effect: NoSchedule
affinity:
podAntiAffinity:
# podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- heapster
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kube-aws.coreos.com/role
operator: In
values:
- controller
# - key: kubernetes.io/hostname
# operator: In
# values:
# - ip-10-1-45-239.eu-west-1.compute.internal
containers:
- image: gcr.io/google_containers/heapster:v1.3.0
name: heapster
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
resources:
limits:
cpu: 80m
memory: 200Mi
requests:
cpu: 80m
memory: 200Mi
command:
- /heapster
- --source=kubernetes.summary_api:''
- image: gcr.io/google_containers/addon-resizer:1.6
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=4m
- --memory=200Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster-v1.3.0
- --container=heapster
- --poll-period=300000
- --estimator=exponential
List of controller nodes
ip-10-1-45-70.eu-west-1.compute.internal
ip-10-1-43-190.eu-west-1.compute.internal
Heapster only running on workers
heapster-v1.3.0-504010935-7zmr9 2/2 Running 0 2d 10.200.31.7 ip-10-1-43-118.eu-west-1.compute.internal
heapster-v1.3.0-504010935-qjrfx 2/2 Running 0 2d 10.200.14.6 ip-10-1-44-100.eu-west-1.compute.internal
heapster-v1.3.0-504010935-wh2lx 2/2 Running 0 2d 10.200.15.4 ip-10-1-45-91.eu-west-1.compute.internal
@paalkr Sorry for the long silence!
Hmm, that's strange. Would you mind sharing me the output from kubectl get no --show-labels
?
Only possible causes in my mind for now are a bug in k8s which makes nodeAffinity broken or a bug in kube-aws which adds uniform labels to all the nodes not only controller but also worker.
@paalkr Any updates on this?
Also - is there any chance you had used an old version of kubectl on your machine (rather than the one run by kube-aws within the cluster) to update the heapster deployment?
An old kubectl would strip all the fields like affinity
and tolerations
added in k8s 1.6.
If that's the case, I'd suggest you to upgrade kubectl first and test if it works, and then modify cloud-config-controller to instruct kube-aws to do it for you.
I discovered that if I disable the nanny heapster container it worked. The nanny is responsible for scaling the pod, any reason in particular to use the nanny rather then hpa for scaling heapster?
@paalkr Thanks for the reply!
According to the last changed date of it, nanny seems to use an older version of kubernetes client to communicate with k8s.
Also, how it updates a k8s deployment is problematic - it is using Update
instead of Patch
. It certainly strips all the recently introduced fields like taints
and affinity
when it reads and then overwrites the heapster deployment.
So, I'd suggest you to not use addon_resizer, or send a PR to fix the problem.
Also, I guess what you want is VPA rather than HPA as scaling "out" heapster doesn't make sense? What you want is scaling "up" of heapster, right? Anyway, VPA is under-development in the kubernetes/autoscaler repo though.
@mumoshu , thanks for the confirmation. I did come to the conclusion that the nanny somehow interfered with my desired goal. And you just confirmed and pinpointed the problem, that the nanny actually is using an old version of the k8s client not compatible with taints and affinity ;)
I'm not sure if I understand what you mean by scaling heapster up instead of out. Not much documentation yet of the VPA initiative regarding the objective and mechanics, or do I look at the wrong location?
I'm sorry to "bump" this issue. But are there any news in regards to how to solve this problem?
I followed discussion and can't understand why running kube-system nodes on workers is something you try to avoid. It doesn't matter where pods are running, nodes will go down on updates.
You can shorter that period by enabling nodeDrainer, which was updated recently and now evicts pods correctly, so downtime would be only time needed to start a pod on a new node.
@paalkr
I'm not sure if I understand what you mean by scaling heapster up instead of out.
Scaling up here means that you provide more resources(cpu, memory) to the single heapster pod so that heapster can keep correcting metrics even when your cluster gets larger. Nanny and VPA automates this process. AFAIK heapster doesn't scale by adding replicas - so scaling "out" isn't an option.
@redbaron Thanks for chiming in ๐ I guess @paalkr's original explanation on the problem adds the context:
When I'm testing auto scaling of the worker node pools I see that some system critical pods are running on the worker nodes. Once in a while when a worker node is terminated by AWS, the critical pods are then terminated and redeployed to any running worked node. Not a big problem, but for example heapster statistics will not be available for the short period of time it takes Kubernetes to restart the pod on a running node.
Suppose you want to enable cluster-autoscaling on your cluster(with CA or AWS-native autoscaling), you can keep running pods on worker nodes as long as a deployment has 2 or more replicas and it isn't "critical"(note that, cluster-autoscaler doesn't try to terminate nodes running critical pods).
For critical single pods like rescheduler, heapster, kube-dns-autoscaler, cluster-autoscaler, (and tiller if you'd like to mark it "critical"), I understand that some people would like schedule them on controller nodes so that (1) they're not affected by cluster autoscaling to incur downtime when AWS-native autoscaling is enabled on worker nodes (2) k8s cluster-autoscaler is able to delete nodes as there would be no worker nodes running critical pods
@paalkr Do I understand your problem correctly?
Just to be sure:
replicas: 2
on it doesn't make senseNo, question had nothing about cluster autoscaler or I fail to see it:
Once in a while when a worker node is terminated by AWS,
Once in a while AWS terminates VMs and really doens't care whether your VM is a Nodepool or Controller or Etcd ASG, so moving critical pod from one ASG to another buys you nothing in that regard.
I agree with your argument that there is a use-case to run certain pods on controllers, thanks for clarification.
@redbaron Thanks,
Once in a while AWS terminates VMs and really doens't care whether your VM is a Nodepool or Controller or Etcd ASG, so moving critical pod from one ASG to another buys you nothing in that regard.
Yes, I agree with you. There's no way to avoid short downtime in such case! Thanks for the clarification too ๐
@mumoshu , @redbaron
Absolutely correct. EC2 instances will fail, it's just a matter of time. And if that instance happens to be the node running heapster you will be out of statistic while the pod is moved to a running node, and a replacing instance added back to the cluster. My goal here was actually to run at least two instances of heaspter in parallel on different nodes, but I understand now that heaspter won't be particularly happy about having any siblings playing alongside ;)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
Hi
I have a running Kubernetes cluster in AWS, with
When I'm testing auto scaling of the worker node pools I see that some system critical pods are running on the worker nodes. Once in a while when a worker node is terminated by AWS, the critical pods are then terminated and redeployed to any running worked node. Not a big problem, but for example heapster statistics will not be available for the short period of time it takes Kubernetes to restart the pod on a running node.
Any reason in particular that these pods are not run on the control plane nodes? And can I force them to run on the control plane nodes by modifying the userdata-controller-file before running kube-aws up?
NAME READY STATUS RESTARTS AGE IP NODE heapster-v1.3.0-76786035-9qq4g 2/2 Running 0 14m 10.200.50.3 ip-10-1-44-196.eu-west-1.compute.internal kube-apiserver-ip-10-1-43-150.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.43.150 ip-10-1-43-150.eu-west-1.compute.internal kube-apiserver-ip-10-1-44-191.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.44.191 ip-10-1-44-191.eu-west-1.compute.internal kube-controller-manager-ip-10-1-43-150.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.43.150 ip-10-1-43-150.eu-west-1.compute.internal kube-controller-manager-ip-10-1-44-191.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.44.191 ip-10-1-44-191.eu-west-1.compute.internal kube-dns-3816048056-5tpmj 4/4 Running 0 1h 10.200.93.6 ip-10-1-43-42.eu-west-1.compute.internal kube-dns-3816048056-bw11s 4/4 Running 0 2h 10.200.12.3 ip-10-1-45-239.eu-west-1.compute.internal kube-dns-autoscaler-1464605019-k3s5k 1/1 Running 0 14m 10.200.50.4 ip-10-1-44-196.eu-west-1.compute.internal kube-proxy-ip-10-1-43-150.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.43.150 ip-10-1-43-150.eu-west-1.compute.internal kube-proxy-ip-10-1-43-42.eu-west-1.compute.internal 1/1 Running 0 1h 10.1.43.42 ip-10-1-43-42.eu-west-1.compute.internal kube-proxy-ip-10-1-44-191.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.44.191 ip-10-1-44-191.eu-west-1.compute.internal kube-proxy-ip-10-1-44-196.eu-west-1.compute.internal 1/1 Running 0 24m 10.1.44.196 ip-10-1-44-196.eu-west-1.compute.internal kube-proxy-ip-10-1-44-236.eu-west-1.compute.internal 1/1 Running 0 6m 10.1.44.236 ip-10-1-44-236.eu-west-1.compute.internal kube-proxy-ip-10-1-45-239.eu-west-1.compute.internal 1/1 Running 0 2h 10.1.45.239 ip-10-1-45-239.eu-west-1.compute.internal kube-rescheduler-3155147949-0rm2p 1/1 Running 0 1h 10.1.43.42 ip-10-1-43-42.eu-west-1.compute.internal kube-scheduler-ip-10-1-43-150.eu-west-1.compute.internal 1/1 Running 0 12h 10.1.43.150 ip-10-1-43-150.eu-west-1.compute.internal kube-scheduler-ip-10-1-44-191.eu-west-1.compute.internal 1/1 Running 1 12h 10.1.44.191 ip-10-1-44-191.eu-west-1.compute.internal kubernetes-dashboard-v1.5.1-lpnbb 1/1 Running 0 1h 10.200.93.4 ip-10-1-43-42.eu-west-1.compute.internal