Closed oomichi closed 6 years ago
All 8 HPA tests failed:
$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=\[HPA\]"
[..]
Summarizing 8 Failures:
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/autoscaling_utils.go:448
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/autoscaling_utils.go:454
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 2 pods to 1 pod
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 1 pod to 2 pods
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/autoscaling_utils.go:461
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
Ran 8 of 877 Specs in 5200.752 seconds
FAIL! -- 0 Passed | 8 Failed | 0 Pending | 869 Skipped --- FAIL: TestE2E (5200.77s)
FAIL
Ginkgo ran 1 suite in 1h26m40.988437641s
Test Suite Failed
On a target cluster with 2 compute nodes (previously 1 compute node), the following insufficient error stops. But different error still happens and HPA tests cannot pass now.
Unschedulable 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate.
Apr 24 23:11:25.268: INFO: waiting for 2 replicas (current: 1)
Apr 24 23:11:45.267: INFO: waiting for 2 replicas (current: 1)
Apr 24 23:11:45.270: INFO: waiting for 2 replicas (current: 1)
Apr 24 23:11:45.270: INFO: Unexpected error occurred: timed out waiting for the condition
[..]
Logging pods the kubelet thinks is on node k8s-v110-flannel-worker02
Apr 24 23:12:01.679: INFO: kube-flannel-ds-kqtts started at 2018-04-24 20:59:44 +0000 UTC (1+1 container statuses recorded)
Apr 24 23:12:01.679: INFO: Init container install-cni ready: true, restart count 0
Apr 24 23:12:01.679: INFO: Container kube-flannel ready: true, restart count 1
Apr 24 23:12:01.679: INFO: kube-proxy-gsqtb started at 2018-04-24 20:59:44 +0000 UTC (0+1 container statuses recorded)
Apr 24 23:12:01.679: INFO: Container kube-proxy ready: true, restart count 0
Apr 24 23:12:01.746: INFO:
Latency metrics for node k8s-v110-flannel-worker02
STEP: Dumping a list of prepulled images on each node...
Apr 24 23:12:01.750: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-horizontal-pod-autoscaling-hxfzj" for this suite.
Apr 24 23:12:07.781: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Apr 24 23:12:07.957: INFO: namespace: e2e-tests-horizontal-pod-autoscaling-hxfzj, resource: bindings, ignored listing per whitelist
Apr 24 23:12:07.991: INFO: namespace e2e-tests-horizontal-pod-autoscaling-hxfzj deletion completed in 6.238444611s
~ Failure [947.898 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
[sig-autoscaling] ReplicationController light
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
Should scale from 1 pod to 2 pods [It]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
timeout waiting 15m0s for 2 replicas
Expected error:
<*errors.errorString | 0xc4200c35a0>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
------------------------------
SSSSSSSSSSSSSSApr 24 23:12:07.991: INFO: Running AfterSuite actions on all node
Apr 24 23:12:07.992: INFO: Running AfterSuite actions on node 1
The corresponding test code is::
70 It("Should scale from 1 pod to 2 pods", func() {
71 scaleTest := &HPAScaleTest{
72 initPods: 1,
73 totalInitialCPUUsage: 150,
74 perPodCPURequest: 200,
75 targetCPUUtilizationPercent: 50,
76 minPods: 1,
77 maxPods: 2,
78 firstScale: 2,
79 }
80 scaleTest.run("rc-light", common.KindRC, rc, f)
81 })
Common part of HPA test
112 // run is a method which runs an HPA lifecycle, from a starting state, to an expected
113 // The initial state is defined by the initPods parameter.
114 // The first state change is due to the CPU being consumed initially, which HPA responds to by changing pod counts.
115 // The second state change (optional) is due to the CPU burst parameter, which HPA again responds to.
116 // TODO The use of 3 states is arbitrary, we could eventually make this test handle "n" states once this test stabilizes.
117 func (scaleTest *HPAScaleTest) run(name string, kind schema.GroupVersionKind, rc *common.ResourceConsumer, f *framework.Framework) {
118 const timeToWait = 15 * time.Minute
119 rc = common.NewDynamicResourceConsumer(name, f.Namespace.Name, kind, int(scaleTest.initPods), int(scaleTest.totalInitialCPUUsage), 0, 0, scaleTest.perPodCPURequest, 200, f.ClientSet, f.InternalClientset, f.ScalesGetter)
120 defer rc.CleanUp()
121 hpa := common.CreateCPUHorizontalPodAutoscaler(rc, scaleTest.targetCPUUtilizationPercent, scaleTest.minPods, scaleTest.maxPods)
122 defer common.DeleteHorizontalPodAutoscaler(rc, hpa.Name)
123 rc.WaitForReplicas(int(scaleTest.firstScale), timeToWait)
Waiting for initial condition(firstScale) before auto scaling happens. The timeout problem happens here due to mismatching expected count
124 if scaleTest.firstScaleStasis > 0 {
125 rc.EnsureDesiredReplicas(int(scaleTest.firstScale), scaleTest.firstScaleStasis)
126 }
127 if scaleTest.cpuBurst > 0 && scaleTest.secondScale > 0 {
128 rc.ConsumeCPU(scaleTest.cpuBurst)
Make CPU peak (cpuBurst).
129 rc.WaitForReplicas(int(scaleTest.secondScale), timeToWait)
Waiting for auto-scaling happens.
130 }
131 }
must read https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
$ kubectl autoscale rc foo --min=2 --max=5 --cpu-percent=80
This seems match to the following test code::
121 hpa := common.CreateCPUHorizontalPodAutoscaler(rc, scaleTest.targetCPUUtilizationPercent, scaleTest.minPods, scaleTest.maxPods)
We can specify a single test with regex like
$ go run hack/e2e.go --provider=skeleton --test --test_args="--ginkgo.focus=1\spod\sto\s2\spods --ginkgo.dryRun=true"
k8s v1.11.1 での実行結果: タイムアウト失敗
$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=1\spod\sto\s2\spods"
2018/08/01 18:00:15 e2e.go:79: Calling kubetest --verbose-commands=true --provider=skeleton --test --test_args=--ginkgo.focus=1\spod\sto\s2\spods...
2018/08/01 18:00:15 process.go:153: Running: ./hack/e2e-internal/e2e-status.sh
Skeleton Provider: prepare-e2e not implemented
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-08-01T02:07:33Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
2018/08/01 18:00:15 process.go:155: Step './hack/e2e-internal/e2e-status.sh' finished in 230.118832ms
2018/08/01 18:00:15 process.go:153: Running: ./cluster/kubectl.sh --match-server-version=false version
2018/08/01 18:00:15 process.go:155: Step './cluster/kubectl.sh --match-server-version=false version' finished in 188.470631ms
2018/08/01 18:00:15 process.go:153: Running: ./hack/ginkgo-e2e.sh --ginkgo.focus=1\spod\sto\s2\spods
Conformance test: not doing test setup.
Aug 1 18:00:16.247: INFO: Overriding default scale value of zero to 1
Aug 1 18:00:16.247: INFO: Overriding default milliseconds value of zero to 5000
I0801 18:00:16.332682 31092 e2e.go:333] Starting e2e run "bec5c789-95b4-11e8-b4cc-fa163e738a69" on Ginkgo node 1
Running Suite: Kubernetes e2e suite
===================================
Random Seed: 1533146416 - Will randomize all specs
Will run 1 of 999 specs
Aug 1 18:00:16.356: INFO: >>> kubeConfig: /home/ubuntu/admin.conf
Aug 1 18:00:16.358: INFO: Waiting up to 30m0s for all (but 0) nodes to be schedulable
Aug 1 18:00:16.397: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Aug 1 18:00:16.451: INFO: 10 / 10 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Aug 1 18:00:16.451: INFO: expected 2 pod replicas in namespace 'kube-system', 2 are Running and Ready.
Aug 1 18:00:16.466: INFO: Waiting for pods to enter Success, but no pods in "kube-system" match label map[name:e2e-image-puller]
Aug 1 18:00:16.466: INFO: Dumping network health container logs from all nodes...
Aug 1 18:00:16.479: INFO: e2e test version: v1.11.1
Aug 1 18:00:16.484: INFO: kube-apiserver version: v1.11.1
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light
Should scale from 1 pod to 2 pods
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
[BeforeEach] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:141
STEP: Creating a kubernetes client
Aug 1 18:00:16.485: INFO: >>> kubeConfig: /home/ubuntu/admin.conf
STEP: Building a namespace api object
Aug 1 18:00:16.622: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[It] Should scale from 1 pod to 2 pods
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
STEP: Running consuming RC rc-light via /v1, Kind=ReplicationController with 1 replicas
STEP: creating replication controller rc-light in namespace e2e-tests-horizontal-pod-autoscaling-dklx7
I0801 18:00:16.645232 31092 runners.go:177] Created replication controller with name: rc-light, namespace: e2e-tests-horizontal-pod-autoscaling-dklx7, replica count: 1
I0801 18:00:26.695869 31092 runners.go:177] rc-light Pods: 1 out of 1 created, 1 running, 0 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
STEP: Running controller
STEP: creating replication controller rc-light-ctrl in namespace e2e-tests-horizontal-pod-autoscaling-dklx7
I0801 18:00:26.767988 31092 runners.go:177] Created replication controller with name: rc-light-ctrl, namespace: e2e-tests-horizontal-pod-autoscaling-dklx7, replica count: 1
I0801 18:00:36.819542 31092 runners.go:177] rc-light-ctrl Pods: 1 out of 1 created, 1 running, 0 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
Aug 1 18:00:41.820: INFO: Waiting for amount of service:rc-light-ctrl endpoints to be 1
Aug 1 18:00:41.833: INFO: RC rc-light: consume 150 millicores in total
Aug 1 18:00:41.833: INFO: RC rc-light: setting consumption to 150 millicores in total
Aug 1 18:00:41.833: INFO: RC rc-light: consume 0 MB in total
Aug 1 18:00:41.833: INFO: RC rc-light: consume custom metric 0 in total
Aug 1 18:00:41.833: INFO: RC rc-light: setting bump of metric QPS to 0 in total
Aug 1 18:00:41.833: INFO: RC rc-light: sending request to consume 150 millicores
Aug 1 18:00:41.834: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug 1 18:00:41.834: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug 1 18:00:41.834: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug 1 18:00:41.834: INFO: RC rc-light: setting consumption to 0 MB in total
Aug 1 18:00:41.835: INFO: RC rc-light: sending request to consume 0 MB
Aug 1 18:00:41.835: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 1 18:00:41.865: INFO: waiting for 2 replicas (current: 1)
Aug 1 18:01:01.875: INFO: waiting for 2 replicas (current: 1)
Aug 1 18:01:11.871: INFO: RC rc-light: sending request to consume 0 MB
Aug 1 18:01:11.871: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 1 18:01:11.871: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug 1 18:01:11.871: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug 1 18:01:12.042: INFO: RC rc-light: sending request to consume 150 millicores
Aug 1 18:01:12.042: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug 1 18:01:21.870: INFO: waiting for 2 replicas (current: 1)
Aug 1 18:01:41.876: INFO: waiting for 2 replicas (current: 1)
Aug 1 18:01:41.894: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug 1 18:01:41.894: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug 1 18:01:41.895: INFO: RC rc-light: sending request to consume 0 MB
Aug 1 18:01:41.895: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 1 18:01:42.241: INFO: RC rc-light: sending request to consume 150 millicores
Aug 1 18:01:42.241: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug 1 18:02:01.871: INFO: waiting for 2 replicas (current: 1)
Aug 1 18:02:11.923: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug 1 18:02:11.923: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug 1 18:02:11.923: INFO: RC rc-light: sending request to consume 0 MB
Aug 1 18:02:11.923: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 1 18:02:12.442: INFO: RC rc-light: sending request to consume 150 millicores
Aug 1 18:02:12.442: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug 1 18:02:21.878: INFO: waiting for 2 replicas (current: 1)
...
まずは HPA を有効にする方法を調べる。 → kubectl get hpa で確認可能。テスト環境は有効になっている模様。
公式サイト: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ 「HPA は Control Loop として実装され、その同期タイミングは Controller Manager の --horizontal-pod-autoscaller-sync-period フラグによって制御される」 ここで言う Controller manager は kube-controller-manager Heapster も定期的に Metrics をチェックするようだが、それとは別物
「HPA は API リソースであり、autoscaling/v1 というAPI として定義されている。
HPA の操作は kubectl で可能だ。kubectl create コマンドで HPA を作成し、
kubectl get hpa で HPA のリストを取得できる。詳細は kubectl describe hpa で参照可能だ。
また、kubectl delete hpa で削除も出来る」
問題が起きている環境で試したところ、No resources found が帰ってきたが、 API としては 200 で成功しているので HPA API 自体は存在する模様
$ kubectl get hpa -v=8
I0801 20:35:26.875828 28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.877550 28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.879167 28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.885239 28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.886342 28459 round_trippers.go:383] GET https://192.168.1.108:6443/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers?limit=500
I0801 20:35:26.886372 28459 round_trippers.go:390] Request Headers:
I0801 20:35:26.886379 28459 round_trippers.go:393] Accept: application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json
I0801 20:35:26.886385 28459 round_trippers.go:393] User-Agent: kubectl/v1.11.1 (linux/amd64) kubernetes/b1b2997
I0801 20:35:26.899223 28459 round_trippers.go:408] Response Status: 200 OK in 12 milliseconds
I0801 20:35:26.899245 28459 round_trippers.go:411] Response Headers:
I0801 20:35:26.899256 28459 round_trippers.go:414] Content-Type: application/json
I0801 20:35:26.899261 28459 round_trippers.go:414] Content-Length: 2497
I0801 20:35:26.899265 28459 round_trippers.go:414] Date: Wed, 01 Aug 2018 20:35:26 GMT
I0801 20:35:26.899464 28459 request.go:897] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1beta1","metadata":{"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers","resourceVersion":"1533485"},"columnDefinitions":[{"name":"Name","type":"string","format":"name","description":"Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/identifiers#names","priority":0},{"name":"Reference","type":"string","format":"","description":"scaleTargetRef points to the target resource to scale, and is used to the pods for which metrics should be collected, as well as to actually change the replica count.","priority":0},{"name":"Targets","type":"string","format":"","description":"metrics contains the specifications for which to use to calculate the desired replica count (the ma [truncated 1473 chars]
No resources found.
作業を通して検出したドキュメントの不備を修正: https://github.com/kubernetes/website/pull/9711
実際に HPA を試してみる。
$ kubectl run nginx --image=nginx
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-64f497f8fd-l2wnb 1/1 Running 0 21h
$
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
nginx-64f497f8fd 1 1 1 21h
$ kubectl autoscale rs nginx-64f497f8fd --min=1 --max=3 --cpu-percent=80
horizontalpodautoscaler.autoscaling/nginx-64f497f8fd autoscaled
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-64f497f8fd ReplicaSet/nginx-64f497f8fd <unknown>/80% 1 3 1 4m
$ kubectl expose deployment nginx --port 80
service/nginx exposed
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.101.55.26 <none> 80/TCP 9s
$ curl http://10.101.55.26:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
$ sudo apt-get install httperf
$ httperf --hog --server 10.101.55.26 --port 80 --rate 300 --num-conn 30000 --num-call 5 --timeout 5
別コンソール → nginx の負荷がぜんぜん増えない・・
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15607 ubuntu 20 0 16980 3040 2552 R 98.0 0.1 0:18.15 httperf
10781 systemd+ 20 0 33076 2968 1712 S 16.0 0.1 0:02.96 nginx
$ kubectl delete hpa nginx-64f497f8fd
$ kubectl autoscale deployment nginx --min=1 --max=3 --cpu-percent=50
$ httperf --hog --server 10.101.55.26 --port 80 --rate 3000 --num-conn 120000 --num-call 5 --timeout 5
別コンソール
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18466 ubuntu 20 0 18432 4324 2524 R 76.1 0.1 0:32.64 httperf
10781 systemd+ 20 0 33076 3292 1904 R 63.1 0.1 1:12.29 nginx
120秒以上 CPU 利用率は60%を超えていたがスケールせず。
Pod での CPU 利用率が取れていない?
$ kubectl get hpa -v=16
...
I0801 22:25:54.290584 379 request.go:897] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1beta1",
"metadata":{"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers","resourceVersion":"1542106"},
"columnDefinitions":[
...
"rows":[
{"cells":[
"nginx",
"Deployment/nginx",
"\u003cunknown\u003e/50%",
"1",
3,
1,
"46m"],
"object":{"kind":"PartialObjectMetadata","apiVersion":"meta.k8s.io/v1beta1",
"metadata":{
"name":"nginx",
"namespace":"default",
"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx",
"uid":"6c2bae03-95d3-11e8-ae67-fa163e420595",
"resourceVersion":"1538551",
"creationTimestamp":"2018-08-01T21:39:52Z"}}}]}
I0801 22:25:54.291141 379 get.go:443] no kind is registered for the type v1beta1.Table
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx <unknown>/50% 1 3 1 46m
「Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.」 が unknown の原因らしい。 deployment 作成時に適切な resource request set を設定する必要があるらしい。
kubectl の help に関連フラグの説明があった
--requests='': The resource requirement requests for this container. For example, 'cpu=100m,memory=256Mi'. Note that server side components may assign requests depending on the server configuration, such as limit ranges.
フラグ --requests をつけて、試してみる。
まずは cleanup
$ kubectl delete hpa nginx
$ kubectl delete service nginx
$ kubectl delete deployment nginx
各設定を実施
$ kubectl run nginx --image=nginx --requests='cpu=100m,memory=256Mi'
$ kubectl expose deployment nginx --port 80
$ kubectl autoscale deployment nginx --min=1 --max=3 --cpu-percent=50
相変わらず、unknown のまま
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx <unknown>/50% 1 3 0 7s
deployment 的には resource request が設定できている
$ kubectl describe deployment nginx
...
Pod Template:
Labels: run=nginx
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Requests:
cpu: 100m
memory: 256Mi
/var/log/containers/kube-controller-manager-k8s-master_kube-system_kube-controller-manager-ecad4d8d03053b6ce0626f9ca60fec563f8b11a34e2b7446c5b40343fedf6a70.log に関連ログが出ていた。
{"log":"I0801 23:49:34.858775 1 event.go:221] Event(v1.ObjectReference{
Kind:\"HorizontalPodAutoscaler\",
Namespace:\"default\",
Name:\"nginx\",
UID:\"cba19f8b-95e3-11e8-ae67-fa163e420595\",
APIVersion:\"autoscaling/v2beta1\",
ResourceVersion:\"1547695\", FieldPath:\"\"}):
type: 'Warning'
reason: 'FailedComputeMetricsReplicas' failed to get cpu utilization: unable to get metrics for
resource cpu: unable to fetch metrics from resource metrics API: the server could not find
the requested resource (get pods.metrics.k8s.io)\n",
"stream":"stderr",
"time":"2018-08-01T23:52:35.04907021Z"}
pods.metrics.k8s.io)\n","stream":"stderr","time":"2018-08-01T23:49:34.859114171Z"}
システム再起動すると直るという話もあったので再起動してみた → だめだ、引き続き上記のログが出力され、unknown が表示される。
Pod の CPU利用率 Metrics を取得する機能を調査する。
kubectl top なるものがあるが、heapster が無いためか実行できない
$ kubectl top pod -v=8
..
I0802 01:02:00.135619 23579 round_trippers.go:383] GET https://192.168.1.108:6443/api/v1/namespaces/kube-system/services/http:heapster:/proxy/apis/metrics/v1alpha1/nodes?labelSelector=
..
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
上記API Call で 404 が返っている。 ログのエラーメッセージからエラー箇所を特定 controller/podautoscaler/replica_calculator.go
54 // GetResourceReplicas calculates the desired replica count based on a target resource utilization percentage
55 // of the given resource for pods matching the given selector in the given namespace, and the current replica count
56 func (c *ReplicaCalculator) GetResourceReplicas(currentReplicas int32, targetUtilization int32, resource v1.ResourceName, namespace string, selector labe ls.Selector) (replicaCount int32, utilization int32, rawUtilization int64, timestamp time.Time, err error) {
57 metrics, timestamp, err := c.metricsClient.GetResourceMetric(resource, namespace, selector)
58 if err != nil {
59 return 0, 0, 0, time.Time{}, fmt.Errorf("unable to get metrics for resource %s: %v", resource, err)
60 }
k/k issue #57996 で
The error (get pods.metrics.k8s.io) is due to metric server, as you are not running metric server. In Kubernetes 1.9+, HPA needs metric server to fetch the CPU usage.
とのコメントあり。 https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/#metrics-server からすると別途 Metrics Server の起動が必要そう・・ もしそうなら metrics-server が kubernetes-incubator にあるの変じゃないか・・ HPA はAPIとして Stable なのに、その構成要素が incubator なんて・・
一先ず、言われたとおりにmetrics-server をデプロイしてみる
$ git clone https://github.com/kubernetes-incubator/metrics-server
$ cd metrics-server/
$ kubectl create -f deploy/1.8+/
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.extensions/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
状況は変わらず、再起動してみる。 → だめだった。 nginx のデプロイから再度やってみる。 → 状況同じ
metrics-server がエラーを出していることが判明、原因を調査する。
$ kubectl log metrics-server-86bd9d7667-ltlpz -n kube-system
...
I0802 01:38:30.801336 1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-5lqpt
I0802 01:38:30.801429 1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-5lqpt
E0802 01:39:05.015074 1 summary.go:97] error while getting metrics summary from Kubelet k8s-master(192.168.1.108:10255): Get http://192.168.1.108:10255/stats/summary/: dial tcp 192.168.1.108:10255: getsockopt: connection refused
E0802 01:39:05.015334 1 summary.go:97] error while getting metrics summary from Kubelet k8s-node01(192.168.1.109:10255): Get http://192.168.1.109:10255/stats/summary/: dial tcp 192.168.1.109:10255: getsockopt: connection refused
E0802 01:40:05.009938 1 summary.go:97] error while getting metrics summary from Kubelet k8s-node01(192.168.1.109:10255): Get http://192.168.1.109:10255/stats/summary/: dial tcp 192.168.1.109:10255: getsockopt: connection refused
E0802 01:40:05.017074 1 summary.go:97] error while getting metrics summary from Kubelet k8s-master(192.168.1.108:10255): Get http://192.168.1.108:10255/stats/summary/: dial tcp 192.168.1.108:10255: getsockopt: connection refused
I0802 01:40:17.567445 1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 01:40:17.567539 1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
I0802 01:40:47.621436 1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 01:40:47.621522 1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
E0802 01:41:05.004503 1 summary.go:97] error while getting metrics summary from Kubelet k8s-node01(192.168.1.109:10255): Get http://192.168.1.109:10255/stats/summary/: dial tcp 192.168.1.109:10255: getsockopt: connection refused
E0802 01:41:05.021225 1 summary.go:97] error while getting metrics summary from Kubelet k8s-master(192.168.1.108:10255): Get http://192.168.1.108:10255/stats/summary/: dial tcp 192.168.1.108:10255: getsockopt: connection refused
I0802 01:41:17.655429 1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 01:41:17.655552 1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
kubelet の設定を変える必要がありそう (kubespray/issues/2550)
This is due to the fact that the kublet read only port was disabled by default.
You can enable the read-only port in all.yml with: kube_read_only_port: 10255
設定箇所 cmd/kubelet/app/options/options.go
452 fs.Int32Var(&c.ReadOnlyPort, "read-only-port", c.ReadOnlyPort, "The read-only port for the Kubelet to serve on with no authentication/authorization (set to 0 to disable)")
kubelet に read-only-port オプションを指定する方法を調査
既存 kubelet プロセスの起動オプション
root 1002 1 8 01:34 ? 01:19:57 /usr/bin/kubelet
--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
--kubeconfig=/etc/kubernetes/kubelet.conf
--config=/var/lib/kubelet/config.yaml
--cgroup-driver=cgroupfs
--cni-bin-dir=/opt/cni/bin
--cni-conf-dir=/etc/cni/net.d
--network-plugin=cni
config.yaml を以下のとおり変更し、システム再起動
--- /var/lib/kubelet/config.yaml.orig 2018-08-02 16:57:23.865340698 +0000
+++ /var/lib/kubelet/config.yaml 2018-08-02 17:14:29.502421376 +0000
@@ -57,6 +57,7 @@
oomScoreAdj: -999
podPidsLimit: -1
port: 10250
+readOnlyPort: 10255
registryBurst: 10
registryPullQPS: 5
resolvConf: /etc/resolv.conf
再起動後、Listen ポートを確認 → 10255 が Listen されていることを確認
$ sudo netstat -anp | grep 102 | grep kubelet
tcp 0 0 127.0.0.1:35881 0.0.0.0:* LISTEN 1020/kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 1020/kubelet
tcp6 0 0 :::10250 :::* LISTEN 1020/kubelet
tcp6 0 0 :::10255 :::* LISTEN 1020/kubelet
metrics-serverのログには接続失敗は表示されなくなったが、依然 Metrics が存在しないという課題あり
I0802 17:19:17.085093 1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 17:19:17.085209 1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
I0802 17:19:47.202876 1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 17:19:47.202939 1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
node01 の設定をしていないため?
$ kubectl get pod -o=wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-5b669f8784-9m2tr 1/1 Running 0 15h 10.244.1.6 k8s-node01
node01 にも設定したところ、metrics-server はエラーを出さず unknown も解消した
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/50% 1 3 1 15h
問題となっていた e2e テストを流してみる → 通るようになった
$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=1\spod\sto\s2\spods"
..
~ [SLOW TEST:133.923 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
[sig-autoscaling] ReplicationController light
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
Should scale from 1 pod to 2 pods
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug 2 17:33:13.282: INFO: Running AfterSuite actions on all node
Aug 2 17:33:13.282: INFO: Running AfterSuite actions on node 1
Ran 1 of 999 Specs in 134.000 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 998 Skipped PASS
Ginkgo ran 1 suite in 2m14.312258357s
Test Suite Passed
2018/08/02 17:33:13 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=1\spod\sto\s2\spods' finished in 2m14.533305342s
2018/08/02 17:33:13 e2e.go:83: Done
kubectl top も動作するようになった
$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
nginx-5b669f8784-9m2tr 0m 3Mi
本件に関わる kubernetes そのものの改善点
だめだ、未だ8件中1件しかテスト成功しない
~ Failure [960.002 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
[sig-autoscaling] ReplicationController light
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
Should scale from 2 pods to 1 pod [It]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:82
timeout waiting 15m0s for 1 replicas
Expected error:
<*errors.errorString | 0xc420085540>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug 2 20:34:37.432: INFO: Running AfterSuite actions on all node
Aug 2 20:34:37.432: INFO: Running AfterSuite actions on node 1
Summarizing 7 Failures:
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 2 pods to 1 pod
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
Ran 8 of 999 Specs in 6821.750 seconds
FAIL! -- 1 Passed | 7 Failed | 0 Pending | 991 Skipped --- FAIL: TestE2E (6821.77s)
FAIL
Ginkgo ran 1 suite in 1h53m42.043036632s
Test Suite Failed
!!! Error in ./hack/ginkgo-e2e.sh:143
Error in ./hack/ginkgo-e2e.sh:143. '"${ginkgo}" "${ginkgo_args[@]:+${ginkgo_args[@]}}" "${e2e_test}" -- "${auth_config[@]:+${auth_config[@]}}" --ginkgo.flakeAttempts="${FLAKE_ATTEMPTS}" --host="${KUBE_MASTER_URL}" --provider="${KUBERNETES_PROVIDER}" --gce-project="${PROJECT:-}" --gce-zone="${ZONE:-}" --gce-region="${REGION:-}" --gce-multizone="${MULTIZONE:-false}" --gke-cluster="${CLUSTER_NAME:-}" --kube-master="${KUBE_MASTER:-}" --cluster-tag="${CLUSTER_ID:-}" --cloud-config-file="${CLOUD_CONFIG:-}" --repo-root="${KUBE_ROOT}" --node-instance-group="${NODE_INSTANCE_GROUP:-}" --prefix="${KUBE_GCE_INSTANCE_PREFIX:-e2e}" --network="${KUBE_GCE_NETWORK:-${KUBE_GKE_NETWORK:-e2e}}" --node-tag="${NODE_TAG:-}" --master-tag="${MASTER_TAG:-}" --cluster-monitoring-mode="${KUBE_ENABLE_CLUSTER_MONITORING:-standalone}" --prometheus-monitoring="${KUBE_ENABLE_PROMETHEUS_MONITORING:-false}" ${KUBE_CONTAINER_RUNTIME:+"--container-runtime=${KUBE_CONTAINER_RUNTIME}"} ${MASTER_OS_DISTRIBUTION:+"--master-os-distro=${MASTER_OS_DISTRIBUTION}"} ${NODE_OS_DISTRIBUTION:+"--node-os-distro=${NODE_OS_DISTRIBUTION}"} ${NUM_NODES:+"--num-nodes=${NUM_NODES}"} ${E2E_REPORT_DIR:+"--report-dir=${E2E_REPORT_DIR}"} ${E2E_REPORT_PREFIX:+"--report-prefix=${E2E_REPORT_PREFIX}"} "${@:-}"' exited with status 1
Call stack:
1: ./hack/ginkgo-e2e.sh:143 main(...)
Exiting with status 1
2018/08/02 20:34:37 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[HPA\]' finished in 1h53m42.253120653s
2018/08/02 20:34:37 main.go:309: Something went wrong: encountered 1 errors: [error during ./hack/ginkgo-e2e.sh --ginkgo.focus=\[HPA\]: exit status 1]
2018/08/02 20:34:37 e2e.go:81: err: exit status 1
exit status 1
失敗したテストを見ると、Scale-down を含むもの? 最もシンプルな「Should scale from 2 pods to 1 pod」を単体で実行してみる
$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=2\spods\sto\s1\spod"
...
Aug 2 20:54:43.663: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 ht-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMet
Aug 2 20:54:43.665: INFO: RC rc-light: sending request to consume 0 MB
Aug 2 20:54:43.665: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/nroxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 2 20:54:50.470: INFO: RC rc-light: sending request to consume 50 millicores
Aug 2 20:54:50.470: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/nroxy/ConsumeCPU false durationSec=30&millicores=50&requestSizeMillicores=20 }
Aug 2 20:54:53.038: INFO: waiting for 1 replicas (current: 2)
Aug 2 20:55:13.041: INFO: waiting for 1 replicas (current: 2)
Aug 2 20:55:13.692: INFO: RC rc-light: sending request to consume 0 of custom m
Aug 2 20:55:13.692: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 ht-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMet
Aug 2 20:55:13.696: INFO: RC rc-light: sending request to consume 0 MB
Aug 2 20:55:13.696: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/nroxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 2 20:55:20.546: INFO: RC rc-light: sending request to consume 50 millicores
Aug 2 20:55:20.546: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/nroxy/ConsumeCPU false durationSec=30&millicores=50&requestSizeMillicores=20 }
Aug 2 20:55:33.043: INFO: waiting for 1 replicas (current: 2)
Aug 2 20:55:43.700: INFO: RC rc-light: sending request to consume 0 of custom m
Aug 2 20:55:43.700: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 ht-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMet
Aug 2 20:55:43.703: INFO: RC rc-light: sending request to consume 0 MB
Aug 2 20:55:43.703: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/nroxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
^[[AAug 2 20:55:50.630: INFO: RC rc-light: sending request to consume 50 millic
Aug 2 20:55:50.630: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/nroxy/ConsumeCPU false durationSec=30&millicores=50&requestSizeMillicores=20 }
Aug 2 20:55:53.044: INFO: waiting for 1 replicas (current: 2)
Aug 2 20:56:13.042: INFO: waiting for 1 replicas (current: 2)
Aug 2 20:56:13.053: INFO: waiting for 1 replicas (current: 2)
Aug 2 20:56:13.053: INFO: Unexpected error occurred: timed out waiting for the condition
STEP: Removing consuming RC rc-light
Aug 2 20:56:13.076: INFO: RC rc-light: stopping CPU consumer
Aug 2 20:56:13.076: INFO: RC rc-light: stopping metric consumer
Aug 2 20:56:13.076: INFO: RC rc-light: stopping mem consumer
STEP: deleting { ReplicationController} rc-light in namespace e2e-tests-horizontal-pod-autoscaling-gk25p, will wait for the garbage collector to delete the pods
Aug 2 20:56:23.165: INFO: Deleting { ReplicationController} rc-light took: 27.919684ms
Aug 2 20:56:23.266: INFO: Terminating { ReplicationController} rc-light pods took: 100.554288ms
STEP: deleting { ReplicationController} rc-light-ctrl in namespace e2e-tests-horizontal-pod-autoscaling-gk25p, will wait for the garbage collector to delete the pods
Aug 2 20:56:31.089: INFO: Deleting { ReplicationController} rc-light-ctrl took: 7.49459ms
Aug 2 20:56:31.190: INFO: Terminating { ReplicationController} rc-light-ctrl pods took: 100.487127ms
[AfterEach] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:142
STEP: Collecting events from namespace "e2e-tests-horizontal-pod-autoscaling-gk25p".
STEP: Found 20 events.
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light: {replication-controller } SuccessfulCreate: Created pod: rc-light-s8q7k
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light: {replication-controller } SuccessfulCreate: Created pod: rc-light-8r7zs
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light-8r7zs: {default-scheduler } Scheduled: Successfully assigned e2e-tests-horizontal-pod-autoscaling-gk25p/rc-light-8r7zs to k8s-node01
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light-s8q7k: {default-scheduler } Scheduled: Successfully assigned e2e-tests-horizontal-pod-autoscaling-gk25p/rc-light-s8q7k to k8s-master
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:49 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Pulled: Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3" already present on machine
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:49 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Created: Created container
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Started: Started container
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Started: Started container
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Created: Created container
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Pulled: Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3" already present on machine
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:57 +0000 UTC - event for rc-light-ctrl: {replication-controller } SuccessfulCreate: Created pod: rc-light-ctrl-d76tm
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:40:57 +0000 UTC - event for rc-light-ctrl-d76tm: {default-scheduler } Scheduled: Successfully assigned e2e-tests-horizontal-pod-autoscaling-gk25p/rc-light-ctrl-d76tm to k8s-node01
Aug 2 20:56:33.178: INFO: At 2018-08-02 20:41:00 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Started: Started container
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:41:00 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Created: Created container
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:41:00 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Pulled: Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0" already present on machine
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:41:43 +0000 UTC - event for rc-light: {horizontal-pod-autoscaler } FailedGetResourceMetric: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:41:43 +0000 UTC - event for rc-light: {horizontal-pod-autoscaler } FailedComputeMetricsReplicas: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:56:23 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Killing: Killing container with id docker://rc-light:Need to kill Pod
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:56:23 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Killing: Killing container with id docker://rc-light:Need to kill Pod
Aug 2 20:56:33.179: INFO: At 2018-08-02 20:56:31 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Killing: Killing container with id docker://rc-light-ctrl:Need to kill Pod
Aug 2 20:56:33.191: INFO: POD NODE PHASE GRACE CONDITIONS
Aug 2 20:56:33.191: INFO: nginx-5b669f8784-9m2tr k8s-node01 Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:39:40 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:48 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:39:40 +0000 UTC }]
Aug 2 20:56:33.191: INFO: coredns-78fcdf6894-gt7zj k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:22 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC }]
Aug 2 20:56:33.191: INFO: coredns-78fcdf6894-tk6ch k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:22 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC }]
Aug 2 20:56:33.191: INFO: etcd-k8s-master k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-apiserver-k8s-master k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-controller-manager-k8s-master k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-flannel-ds-7df6r k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:12:09 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:18 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:31 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-flannel-ds-kl4q4 k8s-node01 Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:01:09 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:43 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:01:01 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-proxy-hxp7z k8s-node01 Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:51 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:42 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:51 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-proxy-zwrl4 k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:37 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:17 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:37 +0000 UTC }]
Aug 2 20:56:33.191: INFO: kube-scheduler-k8s-master k8s-master Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC }]
Aug 2 20:56:33.191: INFO: metrics-server-86bd9d7667-ltlpz k8s-node01 Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:28:14 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:49 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:28:14 +0000 UTC }]
Aug 2 20:56:33.191: INFO:
Aug 2 20:56:33.195: INFO:
Logging node info for node k8s-master
Aug 2 20:56:33.199: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:k8s-master,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/k8s-master,UID:94f19db7-89e3-11e8-b234-fa163e420595,ResourceVersion:1648390,Generation:0,CreationTimestamp:2018-07-17 17:05:18 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: k8s-master,node-role.kubernetes.io/master: ,},Annotations:map[string]string{flannel.alpha.coreos.com/backend-data: {"VtepMAC":"1a:9d:81:1e:9d:0f"},flannel.alpha.coreos.com/backend-type: vxlan,flannel.alpha.coreos.com/kube-subnet-manager: true,flannel.alpha.coreos.com/public-ip: 192.168.1.108,kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock,node.alpha.kubernetes.io/ttl: 0,volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NodeSpec{PodCIDR:10.244.0.0/24,DoNotUse_ExternalID:,ProviderID:,Unschedulable:false,Taints:[],ConfigSource:nil,},Status:NodeStatus{Capacity:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{41567956992 0} {<nil>} 40593708Ki BinarySI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4143394816 0} {<nil>} 4046284Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Allocatable:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{37411161231 0} {<nil>} 37411161231 DecimalSI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4038537216 0} {<nil>} 3943884Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2018-08-02 20:56:31 +0000 UTC 2018-07-31 23:04:27 +0000 UTC KubeletReady kubelet is posting ready status. AppArmor enabled}],Addresses:[{InternalIP 192.168.1.108} {Hostname k8s-master}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:1db2c06c39a54cd3a93a4e0a44823fd6,SystemUUID:1DB2C06C-39A5-4CD3-A93A-4E0A44823FD6,BootID:b14bfb61-a0cc-45f4-8b29-42ee08c00ac6,KernelVersion:4.4.0-130-generic,OSImage:Ubuntu 16.04.5 LTS,ContainerRuntimeVersion:docker://1.11.2,KubeletVersion:v1.11.1,KubeProxyVersion:v1.11.1,OperatingSystem:linux,Architecture:amd64,},Images:[{[golang:1.10] 793901893} {[k8s.gcr.io/etcd-amd64:3.2.18] 218904307} {[k8s.gcr.io/kube-apiserver-amd64:v1.11.1] 186675825} {[k8s.gcr.io/kube-apiserver-amd64:v1.11.0] 186617744} {[k8s.gcr.io/kube-controller-manager-amd64:v1.11.1] 155252555} {[k8s.gcr.io/kube-controller-manager-amd64:v1.11.0] 155203118} {[nginx:latest] 108975101} {[k8s.gcr.io/kube-proxy-amd64:v1.11.1] 97776424} {[k8s.gcr.io/kube-proxy-amd64:v1.11.0] 97772373} {[k8s.gcr.io/kube-scheduler-amd64:v1.11.1] 56781436} {[k8s.gcr.io/kube-scheduler-amd64:v1.11.0] 56757023} {[quay.io/coreos/flannel:v0.9.1-amd64] 51338831} {[gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3] 49707607} {[k8s.gcr.io/coredns:1.1.3] 45587362} {[gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0] 5902947} {[k8s.gcr.io/pause-amd64:3.1 k8s.gcr.io/pause:3.1] 742472}],VolumesInUse:[],VolumesAttached:[],Config:nil,},}
Aug 2 20:56:33.199: INFO:
Logging kubelet events for node k8s-master
Aug 2 20:56:33.202: INFO:
Logging pods the kubelet thinks is on node k8s-master
Aug 2 20:56:33.211: INFO: kube-apiserver-k8s-master started at <nil> (0+0 container statuses recorded)
Aug 2 20:56:33.211: INFO: coredns-78fcdf6894-gt7zj started at 2018-07-17 17:12:52 +0000 UTC (0+1 container statuses recorded)
Aug 2 20:56:33.212: INFO: Container coredns ready: true, restart count 5
Aug 2 20:56:33.212: INFO: coredns-78fcdf6894-tk6ch started at 2018-07-17 17:12:52 +0000 UTC (0+1 container statuses recorded)
Aug 2 20:56:33.212: INFO: Container coredns ready: true, restart count 5
Aug 2 20:56:33.213: INFO: kube-flannel-ds-7df6r started at 2018-07-17 17:12:31 +0000 UTC (1+1 container statuses recorded)
Aug 2 20:56:33.213: INFO: Init container install-cni ready: true, restart count 5
Aug 2 20:56:33.213: INFO: Container kube-flannel ready: true, restart count 5
Aug 2 20:56:33.213: INFO: kube-scheduler-k8s-master started at <nil> (0+0 container statuses recorded)
Aug 2 20:56:33.214: INFO: etcd-k8s-master started at <nil> (0+0 container statuses recorded)
Aug 2 20:56:33.214: INFO: kube-proxy-zwrl4 started at 2018-07-31 23:08:37 +0000 UTC (0+1 container statuses recorded)
Aug 2 20:56:33.214: INFO: Container kube-proxy ready: true, restart count 5
Aug 2 20:56:33.214: INFO: kube-controller-manager-k8s-master started at <nil> (0+0 container statuses recorded)
Aug 2 20:56:33.243: INFO:
Latency metrics for node k8s-master
Aug 2 20:56:33.244: INFO:
Logging node info for node k8s-node01
Aug 2 20:56:33.247: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:k8s-node01,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/k8s-node01,UID:980d8d67-9515-11e8-a804-fa163e420595,ResourceVersion:1648376,Generation:0,CreationTimestamp:2018-07-31 23:01:01 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: k8s-node01,},Annotations:map[string]string{flannel.alpha.coreos.com/backend-data: {"VtepMAC":"a6:73:46:9f:a3:63"},flannel.alpha.coreos.com/backend-type: vxlan,flannel.alpha.coreos.com/kube-subnet-manager: true,flannel.alpha.coreos.com/public-ip: 192.168.1.109,kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock,node.alpha.kubernetes.io/ttl: 0,volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NodeSpec{PodCIDR:10.244.1.0/24,DoNotUse_ExternalID:,ProviderID:,Unschedulable:false,Taints:[],ConfigSource:nil,},Status:NodeStatus{Capacity:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{41567956992 0} {<nil>} 40593708Ki BinarySI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4143394816 0} {<nil>} 4046284Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Allocatable:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{37411161231 0} {<nil>} 37411161231 DecimalSI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4038537216 0} {<nil>} 3943884Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:11 +0000 UTC KubeletReady kubelet is posting ready status. AppArmor enabled}],Addresses:[{InternalIP 192.168.1.109} {Hostname k8s-node01}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:817a385b9de241668e47cd87cda24f47,SystemUUID:817A385B-9DE2-4166-8E47-CD87CDA24F47,BootID:f47e3b62-dd98-42c4-91e2-9b657f85dfd3,KernelVersion:4.4.0-130-generic,OSImage:Ubuntu 16.04.4 LTS,ContainerRuntimeVersion:docker://1.11.2,KubeletVersion:v1.11.1,KubeProxyVersion:v1.11.1,OperatingSystem:linux,Architecture:amd64,},Images:[{[nginx:latest] 108975101} {[k8s.gcr.io/kube-proxy-amd64:v1.11.1] 97776424} {[k8s.gcr.io/kube-proxy-amd64:v1.11.0] 97772373} {[quay.io/coreos/flannel:v0.9.1-amd64] 51338831} {[gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3] 49707607} {[gcr.io/google_containers/metrics-server-amd64:v0.2.1] 42541759} {[gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0] 5902947} {[k8s.gcr.io/pause:3.1] 742472}],VolumesInUse:[],VolumesAttached:[],Config:nil,},}
Aug 2 20:56:33.247: INFO:
Logging kubelet events for node k8s-node01
Aug 2 20:56:33.250: INFO:
Logging pods the kubelet thinks is on node k8s-node01
Aug 2 20:56:33.266: INFO: nginx-5b669f8784-9m2tr started at 2018-08-02 01:39:40 +0000 UTC (0+1 container statuses recorded)
Aug 2 20:56:33.266: INFO: Container nginx ready: true, restart count 1
Aug 2 20:56:33.267: INFO: kube-flannel-ds-kl4q4 started at 2018-07-31 23:01:01 +0000 UTC (1+1 container statuses recorded)
Aug 2 20:56:33.267: INFO: Init container install-cni ready: true, restart count 1
Aug 2 20:56:33.267: INFO: Container kube-flannel ready: true, restart count 2
Aug 2 20:56:33.267: INFO: kube-proxy-hxp7z started at 2018-07-31 23:08:51 +0000 UTC (0+1 container statuses recorded)
Aug 2 20:56:33.268: INFO: Container kube-proxy ready: true, restart count 1
Aug 2 20:56:33.268: INFO: metrics-server-86bd9d7667-ltlpz started at 2018-08-02 01:28:14 +0000 UTC (0+1 container statuses recorded)
Aug 2 20:56:33.268: INFO: Container metrics-server ready: true, restart count 1
Aug 2 20:56:33.345: INFO:
Latency metrics for node k8s-node01
STEP: Dumping a list of prepulled images on each node...
Aug 2 20:56:33.363: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-horizontal-pod-autoscaling-gk25p" for this suite.
Aug 2 20:56:39.438: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Aug 2 20:56:39.533: INFO: namespace: e2e-tests-horizontal-pod-autoscaling-gk25p, resource: bindings, ignored listing per whitelist
Aug 2 20:56:39.577: INFO: namespace e2e-tests-horizontal-pod-autoscaling-gk25p deletion completed in 6.201374321s
~ Failure [951.844 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
[sig-autoscaling] ReplicationController light
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
Should scale from 2 pods to 1 pod [It]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:82
timeout waiting 15m0s for 1 replicas
Expected error:
<*errors.errorString | 0xc4201050d0>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug 2 20:56:39.588: INFO: Running AfterSuite actions on all node
Aug 2 20:56:39.588: INFO: Running AfterSuite actions on node 1
Summarizing 1 Failure:
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 2 pods to 1 pod
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
Ran 1 of 999 Specs in 951.939 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 998 Skipped --- FAIL: TestE2E (951.96s)
FAIL
Ginkgo ran 1 suite in 15m52.197860762s
Test Suite Failed
!!! Error in ./hack/ginkgo-e2e.sh:143
Error in ./hack/ginkgo-e2e.sh:143. '"${ginkgo}" "${ginkgo_args[@]:+${ginkgo_args[@]}}" "${e2e_test}" -- "${auth_config[@]:+${auth_config[@]}}" --ginkgo.flakeAttempts="${FLAKE_ATTEMPTS}" --host="${KUBE_MASTER_URL}" --provider="${KUBERNETES_PROVIDER}" --gce-project="${PROJECT:-}" --gce-zone="${ZONE:-}" --gce-region="${REGION:-}" --gce-multizone="${MULTIZONE:-false}" --gke-cluster="${CLUSTER_NAME:-}" --kube-master="${KUBE_MASTER:-}" --cluster-tag="${CLUSTER_ID:-}" --cloud-config-file="${CLOUD_CONFIG:-}" --repo-root="${KUBE_ROOT}" --node-instance-group="${NODE_INSTANCE_GROUP:-}" --prefix="${KUBE_GCE_INSTANCE_PREFIX:-e2e}" --network="${KUBE_GCE_NETWORK:-${KUBE_GKE_NETWORK:-e2e}}" --node-tag="${NODE_TAG:-}" --master-tag="${MASTER_TAG:-}" --cluster-monitoring-mode="${KUBE_ENABLE_CLUSTER_MONITORING:-standalone}" --prometheus-monitoring="${KUBE_ENABLE_PROMETHEUS_MONITORING:-false}" ${KUBE_CONTAINER_RUNTIME:+"--container-runtime=${KUBE_CONTAINER_RUNTIME}"} ${MASTER_OS_DISTRIBUTION:+"--master-os-distro=${MASTER_OS_DISTRIBUTION}"} ${NODE_OS_DISTRIBUTION:+"--node-os-distro=${NODE_OS_DISTRIBUTION}"} ${NUM_NODES:+"--num-nodes=${NUM_NODES}"} ${E2E_REPORT_DIR:+"--report-dir=${E2E_REPORT_DIR}"} ${E2E_REPORT_PREFIX:+"--report-prefix=${E2E_REPORT_PREFIX}"} "${@:-}"' exited with status 1
Call stack:
1: ./hack/ginkgo-e2e.sh:143 main(...)
Exiting with status 1
2018/08/02 20:56:39 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=2\spods\sto\s1\spod' finished in 15m52.41101064s
2018/08/02 20:56:39 main.go:309: Something went wrong: encountered 1 errors: [error during ./hack/ginkgo-e2e.sh --ginkgo.focus=2\spods\sto\s1\spod: exit status 1]
2018/08/02 20:56:39 e2e.go:81: err: exit status 1
exit status 1
テスト実行中に HPA の状態を見ると Unknown になっている
$ kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-6j9j8 rc-light ReplicationController/rc-light <unknown>/50% 1 2 0 18s
テスト途中でとれるようになった。
$ kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-6j9j8 rc-light ReplicationController/rc-light 52%/50% 1 2 2 2m
スケールダウンするか微妙な CPU 使用率になっている
$ kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-6j9j8 rc-light ReplicationController/rc-light 49%/50% 1 2 2 12m
HPA 詳細
$ kubectl describe hpa rc-light -n e2e-tests-horizontal-pod-autoscaling-6j9j8
Name: rc-light
Namespace: e2e-tests-horizontal-pod-autoscaling-6j9j8
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 02 Aug 2018 20:59:22 +0000
Reference: ReplicationController/rc-light
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 52% (105m) / 50%
Min replicas: 1
Max replicas: 2
ReplicationController pods: 2 current / 2 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 3m horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 3m horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
e2e テストにおける CPU 負荷のかけ方を調べてみる 要約
82 It("Should scale from 2 pods to 1 pod", func() { #これが問題のテスト
83 scaleTest := &HPAScaleTest{
84 initPods: 2,
85 totalInitialCPUUsage: 50,
86 perPodCPURequest: 200,
87 targetCPUUtilizationPercent: 50,
88 minPods: 1,
89 maxPods: 2,
90 firstScale: 1,
91 }
92 scaleTest.run("rc-light", common.KindRC, rc, f)
93 })
94 })
...
117 func (scaleTest *HPAScaleTest) run(name string, kind schema.GroupVersionKind, rc *common.ResourceConsumer, f *framework.Framework) {
118 const timeToWait = 15 * time.Minute
119 rc = common.NewDynamicResourceConsumer(name, f.Namespace.Name, kind, int(scaleTest.initPods), int(scaleTest.totalInitialCPUUsage), 0, 0, scaleTest.perPodCPURequest, 200, f.ClientSet, f.InternalClientset, f.ScalesGetter)
120 defer rc.CleanUp()
121 hpa := common.CreateCPUHorizontalPodAutoscaler(rc, scaleTest.targetCPUUtilizationPercent, scaleTest.minPods, scaleTest.maxPods)
122 defer common.DeleteHorizontalPodAutoscaler(rc, hpa.Name)
123 rc.WaitForReplicas(int(scaleTest.firstScale), timeToWait)
124 if scaleTest.firstScaleStasis > 0 {
125 rc.EnsureDesiredReplicas(int(scaleTest.firstScale), scaleTest.firstScaleStasis)
126 }
127 if scaleTest.cpuBurst > 0 && scaleTest.secondScale > 0 {
128 rc.ConsumeCPU(scaleTest.cpuBurst)
129 rc.WaitForReplicas(int(scaleTest.secondScale), timeToWait)
130 }
131 }
CPU使用率変更 ConsumeCPU()
172 // ConsumeCPU consumes given number of CPU
173 func (rc *ResourceConsumer) ConsumeCPU(millicores int) {
174 framework.Logf("RC %s: consume %v millicores in total", rc.name, millicores)
175 rc.cpu <- millicores
176 }
負荷 RC たぶん5番目と8番目の引数が CPU 負荷に関係
func NewDynamicResourceConsumer(
1 name,
2 nsName string,
3 kind schema.GroupVersionKind,
4 replicas,
5 initCPUTotal, scaleTest.totalInitialCPUUsage (50)
6 initMemoryTotal,
7 initCustomMetric int,
8 cpuLimit, scaleTest.perPodCPURequest (200)
9 memLimit int64,
10 clientset clientset.Interface, internalClientset *internalclientset.Clientset, scaleClient scaleclient.ScalesGetter) *ResourceConsumer {
return newResourceConsumer(name, nsName, kind, replicas, initCPUTotal, initMemoryTotal, initCustomMetric, dynamicConsumptionTimeInSeconds,
dynamicRequestSizeInMillicores, dynamicRequestSizeInMegabytes, dynamicRequestSizeCustomMetric, cpuLimit, memLimit, clientset, internalClientset, scaleClient, nil, nil)
}
本体 initCPUTotal が ConsumeCPU() に渡される。 つまり初期の CPU 使用率は initCPUTotal であり、それはテストケースの totalInitialCPUUsage
132 func newResourceConsumer(name, nsName string, kind schema.GroupVersionKind, replicas, initCPUTotal, initMemoryTotal, initCustomMetric, consumptionTimeInS econds, requestSizeInMillicores,
133 requestSizeInMegabytes int, requestSizeCustomMetric int, cpuLimit, memLimit int64, clientset clientset.Interface, internalClientset *internalclie ntset.Clientset, scaleClient scaleclient.ScalesGetter, podAnnotations, serviceAnnotations map[string]string) *ResourceConsumer {
134 if podAnnotations == nil {
135 podAnnotations = make(map[string]string)
136 }
137 if serviceAnnotations == nil {
138 serviceAnnotations = make(map[string]string)
139 }
140 runServiceAndWorkloadForResourceConsumer(clientset, internalClientset, nsName, name, kind, replicas, cpuLimit, memLimit, podAnnotations, serviceA nnotations)
141 rc := &ResourceConsumer{
142 name: name,
143 controllerName: name + "-ctrl",
144 kind: kind,
145 nsName: nsName,
146 clientSet: clientset,
147 internalClientset: internalClientset,
148 scaleClient: scaleClient,
149 cpu: make(chan int),
150 mem: make(chan int),
151 customMetric: make(chan int),
152 stopCPU: make(chan int),
153 stopMem: make(chan int),
154 stopCustomMetric: make(chan int),
155 consumptionTimeInSeconds: consumptionTimeInSeconds,
156 sleepTime: time.Duration(consumptionTimeInSeconds) * time.Second,
157 requestSizeInMillicores: requestSizeInMillicores,
158 requestSizeInMegabytes: requestSizeInMegabytes,
159 requestSizeCustomMetric: requestSizeCustomMetric,
160 }
161
162 go rc.makeConsumeCPURequests()
163 rc.ConsumeCPU(initCPUTotal)
164
165 go rc.makeConsumeMemRequests()
166 rc.ConsumeMem(initMemoryTotal)
167 go rc.makeConsumeCustomMetric()
168 rc.ConsumeCustomMetric(initCustomMetric)
169 return rc
170 }
CPU 使用率閾値と同じ CPU 負荷の場合、HPA の振る舞いとしてスケールダウンするべきか確認する
The target number of pods is calculated from the following formula:
TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)
上記の式だと、「CPU 使用率閾値と同じ CPU 負荷の場合で負荷ツールの動作にブレがある場合(負荷ツールにブレがあるのは普通)」スケールダウンしないじゃん! k/k pull/66950 として投稿済み
全実行結果 https://github.com/oomichi/try-kubernetes/issues/36 だと、このスケールダウンテストは成功。 負荷ツールの微妙なコンディションに依存するので、成功することもありえる。
最新の HPA テストの失敗項目は以下のとおり
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 5 pods to 3 pods and from 3 to 1
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
一番シンプルな2段階スケールアウトテストを試す
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
Aug 3 21:30:58.853: INFO: waiting for 3 replicas (current: 1)
Aug 3 21:31:18.859: INFO: waiting for 3 replicas (current: 1)
Aug 3 21:31:18.876: INFO: RC test-deployment: sending request to consume 0 MB
Aug 3 21:31:18.876: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 3 21:31:19.095: INFO: RC test-deployment: sending request to consume 250 millicores
Aug 3 21:31:19.095: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 3 21:31:19.095: INFO: RC test-deployment: sending request to consume 0 of custom metric QPS
Aug 3 21:31:19.096: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug 3 21:31:38.855: INFO: waiting for 3 replicas (current: 4)
...
Aug 3 21:35:49.043: INFO: RC test-deployment: sending request to consume 0 MB
Aug 3 21:35:49.043: INFO: ConsumeMem URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeMem false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug 3 21:35:49.380: INFO: RC test-deployment: sending request to consume 0 of custom metric QPS
Aug 3 21:35:49.380: INFO: ConsumeCustomMetric URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/BumpMetric false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug 3 21:35:50.577: INFO: RC test-deployment: sending request to consume 250 millicores
Aug 3 21:35:50.577: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 3 21:35:58.861: INFO: waiting for 3 replicas (current: 5)
Aug 3 21:36:18.848: INFO: waiting for 3 replicas (current: 5)
別コンソール
$ kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-t5rkx test-deployment Deployment/test-deployment 95%/20% 1 5 4 2m
Targetが20%に対して、使用率が95% であれば Pod が3つに収まらないのは正常。 CPU負荷をかけるテストコードを確認する。
e2e/autoscaling/horizontal_pod_autoscaling.go
35 titleUp := "Should scale from 1 pod to 3 pods and from 3 to 5"
36 titleDown := "Should scale from 5 pods to 3 pods and from 3 to 1"
37
38 SIGDescribe("[Serial] [Slow] Deployment", func() {
39 // CPU tests via deployments
40 It(titleUp, func() {
41 scaleUp("test-deployment", common.KindDeployment, false, rc, f)
42 })
...
133 func scaleUp(name string, kind schema.GroupVersionKind, checkStability bool, rc *common.ResourceConsumer, f *framework.Framework) {
134 stasis := 0 * time.Minute
135 if checkStability {
136 stasis = 10 * time.Minute
137 }
138 scaleTest := &HPAScaleTest{
139 initPods: 1,
140 totalInitialCPUUsage: 250,
141 perPodCPURequest: 500,
142 targetCPUUtilizationPercent: 20,
143 minPods: 1,
144 maxPods: 5,
145 firstScale: 3,
146 firstScaleStasis: stasis,
147 cpuBurst: 700,
148 secondScale: 5,
149 }
150 scaleTest.run(name, kind, rc, f)
151 }
CPU使用率の見え方を確認
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:48:54 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment <unknown>/20% 1 5 1 1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:49:20 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment <unknown>/20% 1 5 1 1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:49:29 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 98%/20% 1 5 1 1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:49:34 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 98%/20% 1 5 1 1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:49:48 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 98%/20% 1 5 1 1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:49:57 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 98%/20% 1 5 4 2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:50:13 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 98%/20% 1 5 4 2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:50:32 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 94%/20% 1 5 4 2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:50:45 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 94%/20% 1 5 4 2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug 3 22:50:55 UTC 2018
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-qbghv test-deployment Deployment/test-deployment 94%/20% 1 5 4 3m
テストとして totalInitialCPUUsage: 250 を設定して、kubectl として 98% (Podが1つのとき) に見える関係性を調べる。
totalInitialCPUUsage: 250 は ConsumeCPU() の millicores として渡される。 これはその名のとおり、millicores なので 250 は 0.25 cores(25%) となる。 Target は 20% なので、Pods は2個? しかし、テストでの期待値は3個。 混乱・・・ 負荷コンテナの中身を調べる。 → resource-consumer というコンテナイメージがそれ test/utils/image/manifest.go
25 e2eRegistry = "gcr.io/kubernetes-e2e-test-images"
..
91 ResourceConsumer = ImageConfig{e2eRegistry, "resource-consumer", "1.3", true}
ソースは https://github.com/kubernetes/kubernetes/tree/master/test/images/resource-consumer このコンテナイメージを使って、CPU 負荷の出方を確認するしかなさそう・・ 以下のコマンドで直接、対象Dockerイメージを実行可能
$ sudo docker run --name resource-c gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3
Unable to find image 'gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3' locally
1.3: Pulling from kubernetes-e2e-test-images/resource-consumer
d2519f41f710: Pull complete
7bb286ada61f: Pull complete
0878ec3e77c3: Pull complete
137c312171bf: Pull complete
Digest: sha256:38f1007b38bd5ceab098ac24278066068e1cdd9e3eb2a540233b7d1053be2696
Status: Downloaded newer image for gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3
https://github.com/kubernetes/kubernetes/tree/master/test/images/resource-consumer によると kubectl での実行方法も書いてある。古いけど e2e テストのやり方に合わせて実行してみる。 ★--request パラメータを指定しないと Usage は unknown になり、HPA は動作しない
$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=500m,memory=256Mi'
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
resource-consumer-767f57866f-vh4v7 1/1 Running 0 10s 10.244.1.192 k8s-node01
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
resource-consumer ClusterIP 10.109.92.74 <none> 8080/TCP 38s
$
$ curl --data "millicores=250&durationSec=600" http://10.109.92.74:8080/ConsumeCPU
ConsumeCPU
250 millicores
600 durationSec
iaas-cpu01 では指定どおり25%前後でconsume-cpuが動いている
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29852 root 20 0 9368 6356 1344 S 25.9 0.2 0:20.92 consume-cpu
kubectl でみると 50% 前後で見えているのは何故?
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 49%/20% 1 5 1 4m
3つにスケールアウト、Usageが1/3 になっている → つまり、ここで見えている Usage は1つのPod 毎の平均 Usage
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 16%/20% 1 5 3 11m
Cleanup
$ kubectl delete hpa resource-consumer; kubectl delete service resource-consumer; kubectl delete deployment resource-consumer
Run workload
$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=500m,memory=256Mi'
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ ADDRESS=`kubectl get service | grep resource-consumer | awk '{print $3}'`
$ curl --data "millicores=250&durationSec=600" http://${ADDRESS}:8080/ConsumeCPU
Continue checking hpa condition
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 49%/20% 1 5 2 4m
$
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
resource-consumer-767f57866f-8mb28 1/1 Running 0 1m 10.244.0.183 k8s-master
resource-consumer-767f57866f-w6drh 1/1 Running 0 5m 10.244.1.197 k8s-node01
$ top
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5594 root 20 0 9368 6560 1280 R 25.2 0.2 0:47.85 consume-cpu
実際のプロセスは 25% なのに、なぜ kubectl get hpa では 50% 前後で見える? 25% は millicores で 250 (0.25 cores) を指定しているので正しそうだが。 → Deployment 作成の際に --requests='cpu=500m' を指定しているから 250 millicores 指定の際に 50% になっている。
500 millicores を指定した場合は、50% になるので負荷ツール自体はうまく動いているように見える
$ curl --data "millicores=500&durationSec=600" http://${ADDRESS}:8080/ConsumeCPU
...
$ top
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8085 root 20 0 9368 6612 1344 S 50.3 0.2 0:10.11 consume-cpu
このとき、hpa としては2倍の 100% と見える
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 100%/20% 1 5 1 2m
hpa が表示する CPU 利用率のロジックを確認する。
/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers API の rows.cells 配列の3番目がソレ
$ kubectl get hpa -v=16
...
I0801 22:25:54.290584 379 request.go:897] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1beta1",
"metadata":{"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers","resourceVersion":"1542106"},
"columnDefinitions":[
...
"rows":[
{"cells":[
"nginx",
"Deployment/nginx",
"\u003cunknown\u003e/50%",
"1",
3,
1,
"46m"],
pkg/registry/autoscaling/rest/storage_autoscaling.go
28 horizontalpodautoscalerstore "k8s.io/kubernetes/pkg/registry/autoscaling/horizontalpodautoscaler/storage"
...
33 func (p RESTStorageProvider) NewRESTStorage(apiResourceConfigSource serverstorage.APIResourceConfigSource, restOptionsGetter generic.RESTOptionsGetter) ( genericapiserver.APIGroupInfo, bool) {
34 apiGroupInfo := genericapiserver.NewDefaultAPIGroupInfo(autoscaling.GroupName, legacyscheme.Scheme, legacyscheme.ParameterCodec, legacyscheme.Cod ecs)
35 // If you add a version here, be sure to add an entry in `k8s.io/kubernetes/cmd/kube-apiserver/app/aggregator.go with specific priorities.
36 // TODO refactor the plumbing to provide the information in the APIGroupInfo
37
38 if apiResourceConfigSource.VersionEnabled(autoscalingapiv2beta1.SchemeGroupVersion) {
39 apiGroupInfo.VersionedResourcesStorageMap[autoscalingapiv2beta1.SchemeGroupVersion.Version] = p.v2beta1Storage(apiResourceConfigSource, r estOptionsGetter)
40 }
41 if apiResourceConfigSource.VersionEnabled(autoscalingapiv1.SchemeGroupVersion) {
42 apiGroupInfo.VersionedResourcesStorageMap[autoscalingapiv1.SchemeGroupVersion.Version] = p.v1Storage(apiResourceConfigSource, restOptions Getter)
43 }
44
45 return apiGroupInfo, true
46 }
47
48 func (p RESTStorageProvider) v1Storage(apiResourceConfigSource serverstorage.APIResourceConfigSource, restOptionsGetter generic.RESTOptionsGetter) map[st ring]rest.Storage {
49 storage := map[string]rest.Storage{}
50 // horizontalpodautoscalers
51 hpaStorage, hpaStatusStorage := horizontalpodautoscalerstore.NewREST(restOptionsGetter)
52 storage["horizontalpodautoscalers"] = hpaStorage
53 storage["horizontalpodautoscalers/status"] = hpaStatusStorage
54
55 return storage
56 }
NOTE: pkg/registry ディレクトリ配下の rest というディレクトリに各リソースの REST API 実装がある
~/go/src/k8s.io/kubernetes/pkg/registry$ find . -name rest
./apps/rest
./settings/rest
./scheduling/rest
./rbac/rest
./batch/rest
./core/rest
./core/pod/rest
./core/node/rest
./extensions/rest
./policy/rest
./authentication/rest
./certificates/rest
./admissionregistration/rest
./networking/rest
./authorization/rest
./events/rest
./coordination/rest
./storage/rest
./autoscaling/rest
そこから辿って hpa の場合は詳細実装が pkg/registry/autoscaling/horizontalpodautoscaler/storage/storage.go にある
38 // NewREST returns a RESTStorage object that will work against horizontal pod autoscalers.
39 func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST) {
40 store := &genericregistry.Store{
41 NewFunc: func() runtime.Object { return &autoscaling.HorizontalPodAutoscaler{} },
42 NewListFunc: func() runtime.Object { return &autoscaling.HorizontalPodAutoscalerList{} },
43 DefaultQualifiedResource: autoscaling.Resource("horizontalpodautoscalers"),
44
45 CreateStrategy: horizontalpodautoscaler.Strategy,
46 UpdateStrategy: horizontalpodautoscaler.Strategy,
47 DeleteStrategy: horizontalpodautoscaler.Strategy,
48
49 TableConvertor: printerstorage.TableConvertor{TablePrinter: printers.NewTablePrinter().With(printersinternal.AddHandlers)},
50 }
51 options := &generic.StoreOptions{RESTOptions: optsGetter}
52 if err := store.CompleteWithOptions(options); err != nil {
53 panic(err) // TODO: Propagate error up
54 }
55
56 statusStore := *store
57 statusStore.UpdateStrategy = horizontalpodautoscaler.StatusStrategy
58 return &REST{store}, &StatusREST{store: &statusStore}
59 }
今回知りたいのは GET list of HPAs なので、たぶん autoscaling.HorizontalPodAutoscalerList k8s.io/kubernetes/pkg/apis/autoscaling
pkg/apis/autoscaling/types.go
413 // HorizontalPodAutoscalerList is a list of horizontal pod autoscaler objects.
414 type HorizontalPodAutoscalerList struct {
415 metav1.TypeMeta
416 // Metadata is the standard list metadata.
417 // +optional
418 metav1.ListMeta
419
420 // Items is the list of horizontal pod autoscaler objects.
421 Items []HorizontalPodAutoscaler
422 }
...
394 type HorizontalPodAutoscaler struct {
395 metav1.TypeMeta
396 // Metadata is the standard object metadata.
397 // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
398 // +optional
399 metav1.ObjectMeta
400
401 // Spec is the specification for the behaviour of the autoscaler.
402 // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
403 // +optional
404 Spec HorizontalPodAutoscalerSpec
405
406 // Status is the current information about the autoscaler.
407 // +optional
408 Status HorizontalPodAutoscalerStatus これが CPU 使用率を示していそう
409 }
...
221 type HorizontalPodAutoscalerStatus struct {
222 // ObservedGeneration is the most recent generation observed by this autoscaler.
223 // +optional
224 ObservedGeneration *int64
225
226 // LastScaleTime is the last time the HorizontalPodAutoscaler scaled the number of pods,
227 // used by the autoscaler to control how often the number of pods is changed.
228 // +optional
229 LastScaleTime *metav1.Time
230
231 // CurrentReplicas is current number of replicas of pods managed by this autoscaler,
232 // as last seen by the autoscaler.
233 CurrentReplicas int32
234
235 // DesiredReplicas is the desired number of replicas of pods managed by this autoscaler,
236 // as last calculated by the autoscaler.
237 DesiredReplicas int32
238
239 // CurrentMetrics is the last read state of the metrics used by this autoscaler.
240 CurrentMetrics []MetricStatus これっぽい
241
242 // Conditions is the set of conditions required for this autoscaler to scale its target,
243 // and indicates whether or not those conditions are met.
244 Conditions []HorizontalPodAutoscalerCondition
245 }
...
296 // MetricStatus describes the last-read state of a single metric.
297 type MetricStatus struct {
298 // Type is the type of metric source. It will be one of "Object",
299 // "Pods" or "Resource", each corresponds to a matching field in the object.
300 Type MetricSourceType
301
302 // Object refers to a metric describing a single kubernetes object
303 // (for example, hits-per-second on an Ingress object).
304 // +optional
305 Object *ObjectMetricStatus
306 // Pods refers to a metric describing each pod in the current scale target
307 // (for example, transactions-processed-per-second). The values will be
308 // averaged together before being compared to the target value.
309 // +optional
310 Pods *PodsMetricStatus
311 // Resource refers to a resource metric (such as those specified in
312 // requests and limits) known to Kubernetes describing each pod in the
313 // current scale target (e.g. CPU or memory). Such metrics are built in to
314 // Kubernetes, and have special scaling options on top of those available
315 // to normal per-pod metrics using the "pods" source.
316 // +optional
317 Resource *ResourceMetricStatus これっぽい
318 // External refers to a global metric that is not associated
319 // with any Kubernetes object. It allows autoscaling based on information
320 // coming from components running outside of cluster
321 // (for example length of queue in cloud messaging service, or
322 // QPS from loadbalancer running outside of cluster).
323 // +optional
324 External *ExternalMetricStatus
325 }
正解っぽい
349 // ResourceMetricStatus indicates the current value of a resource metric known to
350 // Kubernetes, as specified in requests and limits, describing each pod in the
351 // current scale target (e.g. CPU or memory). Such metrics are built in to
352 // Kubernetes, and have special scaling options on top of those available to
353 // normal per-pod metrics using the "pods" source.
354 type ResourceMetricStatus struct {
355 // Name is the name of the resource in question.
356 Name api.ResourceName
357 // CurrentAverageUtilization is the current value of the average of the
358 // resource metric across all relevant pods, represented as a percentage of
359 // the requested value of the resource for the pods. It will only be
360 // present if `targetAverageValue` was set in the corresponding metric
361 // specification.
362 // +optional
363 CurrentAverageUtilization *int32
364 // CurrentAverageValue is the current value of the average of the
365 // resource metric across all relevant pods, as a raw value (instead of as
366 // a percentage of the request), similar to the "pods" metric source type.
367 // It will always be set, regardless of the corresponding metric specification.
368 CurrentAverageValue resource.Quantity
369 }
deployment 作成の際に --requests='cpu=500m' を指定しているから 250 millicores 指定の際に 50% になっている? --requests='cpu=1000m' を指定したときに 250 millicores の負荷をかけたときの HPA の状態を確認する。 → 想定は hpa 上、CPU 使用率が 25% として見えること(500mのときは50%だった)
$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=1000m,memory=256Mi'
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ ADDRESS=`kubectl get service | grep resource-consumer | awk '{print $3}'`
$ curl --data "millicores=250&durationSec=600" http://${ADDRESS}:8080/ConsumeCPU
hpa 状態を確認 → 想定どおり。更に pods が2つにスケールし、12%になることを想定。
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 24%/20% 1 5 1 2m
一定時間後 hpa 状態を確認 → 想定どおり
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 12%/20% 1 5 2 4m
やっぱり、k/kubernetes/pull/66982 は必須じゃん!
上記の知識を得た状態で、問題となっている e2e テストのテスト条件を再度確認する → scaleUp については妥当に見える test/e2e/autoscaling/horizontal_pod_autoscaling.go
133 func scaleUp(name string, kind schema.GroupVersionKind, checkStability bool, rc *common.ResourceConsumer, f *framework.Framework) {
134 stasis := 0 * time.Minute
135 if checkStability {
136 stasis = 10 * time.Minute
137 }
138 scaleTest := &HPAScaleTest{
139 initPods: 1, // 初期のPod数
140 totalInitialCPUUsage: 250, // 初期スケールアウト時に負荷Podが行う CPU負荷(25%)
141 perPodCPURequest: 500, // 負荷 Pod の要求 CPU使用率 (50%)、つまり HPA上はCPU負荷が50%(25% / 50%)
142 targetCPUUtilizationPercent: 20, // 目標とする 1 Podあたりの CPU負荷 (20%)、平均がその値になるように Pod 数を増減させる
143 minPods: 1,
144 maxPods: 5,
145 firstScale: 3, // 最初のスケールアウト時の想定 Pod 数。目標CPU負荷20%に対してHPA上のCPU負荷が50% になるため、3つのPodsを作り平均17%以内に収める。妥当
146 firstScaleStasis: stasis,
147 cpuBurst: 700, // 2回目のスケールアウト時にかける CPU 負荷。perPodCPURequest が 500 のため、HPA 上は CPU 負荷が 140% となる
148 secondScale: 5, // 2回目のスケールアウト時の想定 Pod 数。targetCPUUtilizationPercent が20のため、本来は 7 個の Pods が必要だが(=140/20)が maxPods が5のため、この数になる
149 }
150 scaleTest.run(name, kind, rc, f)
151 }
問題のテスト実行時の状況を確認する 一番シンプルな2段階スケールアウトテストを試す
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
Pod数3想定に対して4つになっているのが問題
$ kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-t5rkx test-deployment Deployment/test-deployment 95%/20% 1 5 4 2m
負荷 Pod の要求 CPU使用率 (50%, perPodCPURequest: 500)、つまり HPA上はCPU負荷が50%(25% / 50%) になるはずが 95% になっているのが問題
プロセス単位でContainer負荷が50%になっていることを確認する (2重実行されている可能性もあり?シリアル/パラレルテスト実行もチェック)
$ kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
e2e-tests-horizontal-pod-autoscaling-fhrzw test-deployment Deployment/test-deployment 99%/20% 1 5 1 1m
負荷プロセスがいっぱい
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1065 root 20 0 948596 139176 54948 S 14.2 3.4 676:29.88 kubelet
25666 root 20 0 9368 6044 1280 S 4.0 0.1 0:00.91 consume-cpu
25679 root 20 0 9368 5788 1284 S 4.0 0.1 0:00.89 consume-cpu
25775 root 20 0 9368 6248 1220 R 4.0 0.2 0:00.93 consume-cpu
25665 root 20 0 9432 6296 1280 S 3.6 0.2 0:00.91 consume-cpu
25667 root 20 0 9368 6352 1344 S 3.6 0.2 0:00.89 consume-cpu
25685 root 20 0 9368 5780 1280 R 3.6 0.1 0:00.90 consume-cpu
25694 root 20 0 9368 6048 1284 S 3.6 0.1 0:00.90 consume-cpu
25698 root 20 0 9368 6048 1280 S 3.6 0.1 0:00.89 consume-cpu
25702 root 20 0 9368 6044 1268 S 3.6 0.1 0:00.90 consume-cpu
25703 root 20 0 9368 6348 1344 S 3.6 0.2 0:00.89 consume-cpu
25741 root 20 0 9368 6092 1344 S 3.6 0.2 0:00.90 consume-cpu
25763 root 20 0 9368 6036 1280 S 3.6 0.1 0:00.91 consume-cpu
25675 root 20 0 9368 6368 1344 S 3.3 0.2 0:00.87 consume-cpu
3.6%前後の負荷ツールプロセスが13、50%前後の CPU 負荷がかかっている(本来25%を想定) 手動で試したときと負荷のかかり方が異なる(負荷ツールは1プロセスだったし、合計も異なる)
テストログから CPU 負荷に関する項目を抽出
Aug 6 23:34:50.709: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:35:20.901: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:35:50.987: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:36:21.086: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:36:51.189: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:37:21.367: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:37:51.484: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug 6 23:38:21.617: INFO: ConsumeCPU URL: {https 192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU false durationSec=30&millicores=250&requestSizeMillicores=20 }
上記 e2e テストログの負荷を試してみる
$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=500m,memory=256Mi'
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ ADDRESS=`kubectl get service | grep resource-consumer | awk '{print $3}'`
$ curl --data "durationSec=30&millicores=250&requestSizeMillicores=20" http://${ADDRESS}:8080/ConsumeCPU
複数プロセスにはならない
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29246 root 20 0 9304 5212 1220 S 24.8 0.1 0:01.90 consume-cpu
テストの想定どおり3になる
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
resource-consumer Deployment/resource-consumer 16%/20% 1 5 3 9m
負荷ツールのコードを見ると、複数プロセスにならないのは変 test/images/resource-consumer/controller/controller.go 250(millicores) / 20(requestSizeMillicores) = 12.5 で13になる
99 count := millicores / requestSizeInMillicores
100 rest := millicores - count*requestSizeInMillicores
101 fmt.Fprintf(w, "RC manager: sending %v requests to consume %v millicores each and 1 request to consume %v millicores\n", count, requestSizeInMill icores, rest)
102 if count > 0 {
103 handler.waitGroup.Add(count)
104 handler.sendConsumeCPURequests(w, count, requestSizeInMillicores, durationSec)
105 }
106 if rest > 0 {
107 handler.waitGroup.Add(1)
108 go handler.sendOneConsumeCPURequest(w, rest, durationSec)
109 }
110 handler.waitGroup.Wait()
111 }
Imageが gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 では無い? 最新のコードだとrequestSizeMillicoresを指定しないとエラーになるはずが、手動実行だとエラーにならないし。
e2e のテストログ上、どのイメージを使っているかは出力されない。
$ kubectl describe deployment <deployment名>
で確認する。 rs テストのときは
Image: gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0
deployment のときは
Image: gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3
詳細
$ kubectl describe deployments test-deployment -n=e2e-tests-horizontal-pod-autoscaling-cndn6
Name: test-deployment
Namespace: e2e-tests-horizontal-pod-autoscaling-cndn6
CreationTimestamp: Tue, 07 Aug 2018 00:35:11 +0000
Labels: name=test-deployment
Annotations: deployment.kubernetes.io/revision=1
Selector: name=test-deployment
Replicas: 4 desired | 4 updated | 4 total | 4 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: name=test-deployment
Containers:
test-deployment:
Image: gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 500m
memory: 209715200
Requests:
cpu: 500m
memory: 209715200
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: test-deployment-5fd8dc46f8 (4/4 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m deployment-controller Scaled up replica set test-deployment-5fd8dc46f8 to 1
Normal ScalingReplicaSet 34s deployment-controller Scaled up replica set test-deployment-5fd8dc46f8 to 4
$ kubectl describe hpa test-deployment -n=e2e-tests-horizonta
l-pod-autoscaling-cndn6
Name: test-deployment
Namespace: e2e-tests-horizontal-pod-autoscaling-cndn6
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 07 Aug 2018 00:35:36 +0000
Reference: Deployment/test-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 66% (334m) / 20%
Min replicas: 1
Max replicas: 5
Deployment pods: 4 current / 4 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale False BackoffBoth the time since the previous scale is still within both the downscale and upscale forbidden windows
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m (x2 over 2m) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 2m (x2 over 2m) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Normal SuccessfulRescale 1m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
あぁ、なんかわかってきた。 resource-consumerのCPU消費に関するHTTP query の処理が二箇所ある。
controller/controller.go: handler.handleConsumeCPU(w, req.Form)
controller/controller.go:func (handler *Controller) handleConsumeCPU(w http.ResponseWriter, query url.Values) {
resource_consumer_handler.go: handler.handleConsumeCPU(w, req.Form)
resource_consumer_handler.go:func (handler *ResourceConsumerHandler) handleConsumeCPU(w http.ResponseWriter, query url.Values) {
たぶん、e2eテストは controller/controller.go 経由でリクエストしており、それが RequestSizeInMillicoresQuery をサポートしている。 そして、それが複数回 resource_consumer_handler.go をコールしていたと考えられる。 手動テストでは resource_consumer_handler.go を直接コールしていたので、RequestSizeInMillicoresQuery をサポートしていなかったのでは?
下記からすると、本来1プロセスあたり 2% しか負荷がかかるべきではないところが、4% かかっている?
durationSec=30&millicores=250&requestSizeMillicores=20
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1065 root 20 0 948596 139176 54948 S 14.2 3.4 676:29.88 kubelet
25666 root 20 0 9368 6044 1280 S 4.0 0.1 0:00.91 consume-cpu
25679 root 20 0 9368 5788 1284 S 4.0 0.1 0:00.89 consume-cpu
25775 root 20 0 9368 6248 1220 R 4.0 0.2 0:00.93 consume-cpu
25665 root 20 0 9432 6296 1280 S 3.6 0.2 0:00.91 consume-cpu
25667 root 20 0 9368 6352 1344 S 3.6 0.2 0:00.89 consume-cpu
25685 root 20 0 9368 5780 1280 R 3.6 0.1 0:00.90 consume-cpu
25694 root 20 0 9368 6048 1284 S 3.6 0.1 0:00.90 consume-cpu
25698 root 20 0 9368 6048 1280 S 3.6 0.1 0:00.89 consume-cpu
25702 root 20 0 9368 6044 1268 S 3.6 0.1 0:00.90 consume-cpu
25703 root 20 0 9368 6348 1344 S 3.6 0.2 0:00.89 consume-cpu
25741 root 20 0 9368 6092 1344 S 3.6 0.2 0:00.90 consume-cpu
25763 root 20 0 9368 6036 1280 S 3.6 0.1 0:00.91 consume-cpu
25675 root 20 0 9368 6368 1344 S 3.3 0.2 0:00.87 consume-cpu
Bingっぽい。2%で3つ負荷をかけたところ、6%以上の負荷がかかっている。
$ curl --data "durationSec=300&millicores=20" http://${ADDRESS}:8080/ConsumeCPU
$ curl --data "durationSec=300&millicores=20" http://${ADDRESS}:8080/ConsumeCPU
$ curl --data "durationSec=300&millicores=20" http://${ADDRESS}:8080/ConsumeCPU
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15096 root 20 0 9368 6252 1220 S 6.9 0.2 0:06.50 consume-cpu
15156 root 20 0 9368 6252 1220 S 6.6 0.2 0:05.46 consume-cpu
15067 root 20 0 9368 6288 1280 S 6.3 0.2 0:07.69 consume-cpu
25% でかけると、想定どおりの負荷がかかる。
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16724 root 20 0 8180 4492 1288 R 25.0 0.1 0:01.35 consume-cpu
負荷が想定どおりかからない閾値を見つける。 10%は想定どおり
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15846 root 20 0 9368 6036 1280 R 10.0 0.1 0:01.92 consume-cpu
5%は駄目
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16200 root 20 0 9304 6368 1344 S 7.0 0.2 0:00.79 consume-cpu
問題のコードは test/e2e/common/autoscaling_utils.go
46 dynamicRequestSizeInMillicores = 20
k/kubernetes/issues/67051 として Issue 登録済み k/kubernetes/pull/67053 として PR 投稿済み
--check-version-skew=false オプションを指定して実行したところ、Pod の3から5のオートスケールで失敗。原因は CPU リソース不足 → テスト中断時に残った deployment, rc, rs が原因、Cleanup 実施
$ kubectl get pods -n=e2e-tests-horizontal-pod-autoscaling-fcq2t
NAME READY STATUS RESTARTS AGE
test-deployment-5fd8dc46f8-9j59v 1/1 Running 0 9m
test-deployment-5fd8dc46f8-d5c42 0/1 Pending 0 5m
test-deployment-5fd8dc46f8-lxfbp 0/1 Pending 0 5m
test-deployment-5fd8dc46f8-nwltt 1/1 Running 0 10m
test-deployment-5fd8dc46f8-wz68c 1/1 Running 0 9m
$
$ kubectl describe pod test-deployment-5fd8dc46f8-d5c42 -n=e2e-tests-horizontal-pod-autoscaling-fcq2t
..
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 24s (x63 over 5m) default-scheduler 0/2 nodes are available: 2 Insufficient cpu.
k/kubernetes/pull/67053 を適用したところ問題だった2つの HPA テストが通るようになった。
$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=from\s3\sto\s5$" --check-version-skew=false
..
Ran 2 of 999 Specs in 752.086 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 997 Skipped PASS
Ginkgo ran 1 suite in 12m32.321622799s
Test Suite Passed
2018/08/07 17:54:46 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=from\s3\sto\s5$' finished in 12m32.529859891s
2018/08/07 17:54:46 e2e.go:83: Done
HPA テストを全実行してみる → 通るようになった!
$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=\[HPA\]" --check-version-skew=false
...
S [SKIPPING] in Spec Setup (BeforeEach) [0.001 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: Custom Metrics from Stackdriver)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
should scale down with External Metric with target value from Stackdriver [Feature:CustomMetricsAutoscaling] [BeforeEach]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/custom_metrics_stackdriver_autoscaling.go:85
Aug 7 19:12:17.376: Only supported for providers [gce gke] (not skeleton)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/util.go:305
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug 7 19:12:17.378: INFO: Running AfterSuite actions on all node
Aug 7 19:12:17.378: INFO: Running AfterSuite actions on node 1
Ran 8 of 999 Specs in 4484.970 seconds
SUCCESS! -- 8 Passed | 0 Failed | 0 Pending | 991 Skipped PASS
Ginkgo ran 1 suite in 1h14m45.203539796s
Test Suite Passed
2018/08/07 19:12:17 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[HPA\]' finished in 1h14m45.413142012s
2018/08/07 19:12:17 e2e.go:83: Done
k/kubernetes/pull/67053 がマージされたら close する
Close!!
結論
This is one of issue https://github.com/oomichi/try-kubernetes/issues/17
The test is failed because of
and that happens due to less compute node like:
本件、kubernetes v1.11 でも引き続き発生。