Test Failure: [HPA] Horizontal pod autoscaling (scale resource: CPU)

結論

HPA には metrics-server の設定が必須（ドキュメント追記、テストチェック追加を実施）
1つのHPAテストで失敗（タイムアウト）は16分、成功すると2分14秒で完了失敗すると時間を消費してしまう1例
HPA の CPU使用率は Deployment 作成時に --request フラグで指定した CPU 要求量に対する割合たとえば --requests='cpu=500m,memory=256Mi' を指定した場合、Pod の負荷が 25% の場合 HPA として見える（kubectl get hpa）CPU 使用率は 50% (= 250 msec / 500 msec) となる。
負荷ツールの使い方がストリクトすぎて、CPU 負荷のブレを許容できず HPA が正常動作しなかった。具体的には複数のCPU負荷ツールプロセスに対して2%毎の負荷をかけようとしたが、実際には 4-6%の負荷がかかり、HPA として想定Pod数以上のスケールアウトを実施していた。 10%の負荷を提案する。

This is one of issue https://github.com/oomichi/try-kubernetes/issues/17

The test is failed because of

I0404 22:01:33.211083   14594 runners.go:175] Pod rs-cz9bx      k8s-v110-flannel-worker Running <nil>
I0404 22:01:33.211152   14594 runners.go:175] Pod rs-fm7c8      k8s-v110-flannel-worker Running <nil>
I0404 22:01:33.211163   14594 runners.go:175] Pod rs-g96bg      k8s-v110-flannel-worker Running <nil>
I0404 22:01:33.211180   14594 runners.go:175] Pod rs-j2dq9              Pending <nil>
I0404 22:01:33.211189   14594 runners.go:175] Pod rs-wcpmh      k8s-v110-flannel-worker Running <nil>
Apr  4 22:01:33.211: INFO: Unexpected error occurred: Only 4 pods started out of 5

and that happens due to less compute node like:

Apr  4 22:01:33.225: INFO: At 2018-04-04 21:59:23 +0000 UTC - event for rs-j2dq9: {default-scheduler } FailedScheduling: 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate.

本件、kubernetes v1.11 でも引き続き発生。

All 8 HPA tests failed:

$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=\[HPA\]"
[..]
Summarizing 8 Failures:

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/autoscaling_utils.go:448

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/autoscaling_utils.go:454

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 2 pods to 1 pod
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 1 pod to 2 pods
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/autoscaling_utils.go:461

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

Ran 8 of 877 Specs in 5200.752 seconds
FAIL! -- 0 Passed | 8 Failed | 0 Pending | 869 Skipped --- FAIL: TestE2E (5200.77s)
FAIL

Ginkgo ran 1 suite in 1h26m40.988437641s
Test Suite Failed

On a target cluster with 2 compute nodes (previously 1 compute node), the following insufficient error stops. But different error still happens and HPA tests cannot pass now.

Unschedulable 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate.

Apr 24 23:11:25.268: INFO: waiting for 2 replicas (current: 1)
Apr 24 23:11:45.267: INFO: waiting for 2 replicas (current: 1)
Apr 24 23:11:45.270: INFO: waiting for 2 replicas (current: 1)
Apr 24 23:11:45.270: INFO: Unexpected error occurred: timed out waiting for the condition
[..]
Logging pods the kubelet thinks is on node k8s-v110-flannel-worker02
Apr 24 23:12:01.679: INFO: kube-flannel-ds-kqtts started at 2018-04-24 20:59:44 +0000 UTC (1+1 container statuses recorded)
Apr 24 23:12:01.679: INFO:      Init container install-cni ready: true, restart count 0
Apr 24 23:12:01.679: INFO:      Container kube-flannel ready: true, restart count 1
Apr 24 23:12:01.679: INFO: kube-proxy-gsqtb started at 2018-04-24 20:59:44 +0000 UTC (0+1 container statuses recorded)
Apr 24 23:12:01.679: INFO:      Container kube-proxy ready: true, restart count 0
Apr 24 23:12:01.746: INFO:
Latency metrics for node k8s-v110-flannel-worker02
STEP: Dumping a list of prepulled images on each node...
Apr 24 23:12:01.750: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-horizontal-pod-autoscaling-hxfzj" for this suite.
Apr 24 23:12:07.781: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Apr 24 23:12:07.957: INFO: namespace: e2e-tests-horizontal-pod-autoscaling-hxfzj, resource: bindings, ignored listing per whitelist
Apr 24 23:12:07.991: INFO: namespace e2e-tests-horizontal-pod-autoscaling-hxfzj deletion completed in 6.238444611s

~ Failure [947.898 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
  [sig-autoscaling] ReplicationController light
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
    Should scale from 1 pod to 2 pods [It]
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70

    timeout waiting 15m0s for 2 replicas
    Expected error:
        <*errors.errorString | 0xc4200c35a0>: {
            s: "timed out waiting for the condition",
        }
        timed out waiting for the condition
    not to have occurred

    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
------------------------------
SSSSSSSSSSSSSSApr 24 23:12:07.991: INFO: Running AfterSuite actions on all node
Apr 24 23:12:07.992: INFO: Running AfterSuite actions on node 1

The corresponding test code is::

 70                 It("Should scale from 1 pod to 2 pods", func() {
 71                         scaleTest := &HPAScaleTest{
 72                                 initPods:                    1,
 73                                 totalInitialCPUUsage:        150,
 74                                 perPodCPURequest:            200,
 75                                 targetCPUUtilizationPercent: 50,
 76                                 minPods:                     1,
 77                                 maxPods:                     2,
 78                                 firstScale:                  2,
 79                         }
 80                         scaleTest.run("rc-light", common.KindRC, rc, f)
 81                 })

Common part of HPA test

112 // run is a method which runs an HPA lifecycle, from a starting state, to an expected
113 // The initial state is defined by the initPods parameter.
114 // The first state change is due to the CPU being consumed initially, which HPA responds to by changing pod counts.
115 // The second state change (optional) is due to the CPU burst parameter, which HPA again responds to.
116 // TODO The use of 3 states is arbitrary, we could eventually make this test handle "n" states once this test stabilizes.
117 func (scaleTest *HPAScaleTest) run(name string, kind schema.GroupVersionKind, rc *common.ResourceConsumer, f *framework.Framework) {
118         const timeToWait = 15 * time.Minute
119         rc = common.NewDynamicResourceConsumer(name, f.Namespace.Name, kind, int(scaleTest.initPods), int(scaleTest.totalInitialCPUUsage), 0, 0, scaleTest.perPodCPURequest, 200, f.ClientSet, f.InternalClientset, f.ScalesGetter)
120         defer rc.CleanUp()
121         hpa := common.CreateCPUHorizontalPodAutoscaler(rc, scaleTest.targetCPUUtilizationPercent, scaleTest.minPods, scaleTest.maxPods)
122         defer common.DeleteHorizontalPodAutoscaler(rc, hpa.Name)
123         rc.WaitForReplicas(int(scaleTest.firstScale), timeToWait)

Waiting for initial condition(firstScale) before auto scaling happens. The timeout problem happens here due to mismatching expected count

124         if scaleTest.firstScaleStasis > 0 {
125                 rc.EnsureDesiredReplicas(int(scaleTest.firstScale), scaleTest.firstScaleStasis)
126         }
127         if scaleTest.cpuBurst > 0 && scaleTest.secondScale > 0 {
128                 rc.ConsumeCPU(scaleTest.cpuBurst)

Make CPU peak (cpuBurst).

129                 rc.WaitForReplicas(int(scaleTest.secondScale), timeToWait)

Waiting for auto-scaling happens.

130         }
131 }

must read https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

$ kubectl autoscale rc foo --min=2 --max=5 --cpu-percent=80

This seems match to the following test code::

121         hpa := common.CreateCPUHorizontalPodAutoscaler(rc, scaleTest.targetCPUUtilizationPercent, scaleTest.minPods, scaleTest.maxPods)

We can specify a single test with regex like

$ go run hack/e2e.go --provider=skeleton --test --test_args="--ginkgo.focus=1\spod\sto\s2\spods --ginkgo.dryRun=true"

k8s v1.11.1 での実行結果: タイムアウト失敗

$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=1\spod\sto\s2\spods"
2018/08/01 18:00:15 e2e.go:79: Calling kubetest --verbose-commands=true --provider=skeleton --test --test_args=--ginkgo.focus=1\spod\sto\s2\spods...
2018/08/01 18:00:15 process.go:153: Running: ./hack/e2e-internal/e2e-status.sh
Skeleton Provider: prepare-e2e not implemented
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-08-01T02:07:33Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
2018/08/01 18:00:15 process.go:155: Step './hack/e2e-internal/e2e-status.sh' finished in 230.118832ms
2018/08/01 18:00:15 process.go:153: Running: ./cluster/kubectl.sh --match-server-version=false version
2018/08/01 18:00:15 process.go:155: Step './cluster/kubectl.sh --match-server-version=false version' finished in 188.470631ms
2018/08/01 18:00:15 process.go:153: Running: ./hack/ginkgo-e2e.sh --ginkgo.focus=1\spod\sto\s2\spods
Conformance test: not doing test setup.
Aug  1 18:00:16.247: INFO: Overriding default scale value of zero to 1
Aug  1 18:00:16.247: INFO: Overriding default milliseconds value of zero to 5000
I0801 18:00:16.332682   31092 e2e.go:333] Starting e2e run "bec5c789-95b4-11e8-b4cc-fa163e738a69" on Ginkgo node 1
Running Suite: Kubernetes e2e suite
===================================
Random Seed: 1533146416 - Will randomize all specs
Will run 1 of 999 specs

Aug  1 18:00:16.356: INFO: >>> kubeConfig: /home/ubuntu/admin.conf
Aug  1 18:00:16.358: INFO: Waiting up to 30m0s for all (but 0) nodes to be schedulable
Aug  1 18:00:16.397: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Aug  1 18:00:16.451: INFO: 10 / 10 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Aug  1 18:00:16.451: INFO: expected 2 pod replicas in namespace 'kube-system', 2 are Running and Ready.
Aug  1 18:00:16.466: INFO: Waiting for pods to enter Success, but no pods in "kube-system" match label map[name:e2e-image-puller]
Aug  1 18:00:16.466: INFO: Dumping network health container logs from all nodes...
Aug  1 18:00:16.479: INFO: e2e test version: v1.11.1
Aug  1 18:00:16.484: INFO: kube-apiserver version: v1.11.1
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light
  Should scale from 1 pod to 2 pods
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
[BeforeEach] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:141
STEP: Creating a kubernetes client
Aug  1 18:00:16.485: INFO: >>> kubeConfig: /home/ubuntu/admin.conf
STEP: Building a namespace api object
Aug  1 18:00:16.622: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[It] Should scale from 1 pod to 2 pods
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
STEP: Running consuming RC rc-light via /v1, Kind=ReplicationController with 1 replicas
STEP: creating replication controller rc-light in namespace e2e-tests-horizontal-pod-autoscaling-dklx7
I0801 18:00:16.645232   31092 runners.go:177] Created replication controller with name: rc-light, namespace: e2e-tests-horizontal-pod-autoscaling-dklx7, replica count: 1
I0801 18:00:26.695869   31092 runners.go:177] rc-light Pods: 1 out of 1 created, 1 running, 0 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
STEP: Running controller
STEP: creating replication controller rc-light-ctrl in namespace e2e-tests-horizontal-pod-autoscaling-dklx7
I0801 18:00:26.767988   31092 runners.go:177] Created replication controller with name: rc-light-ctrl, namespace: e2e-tests-horizontal-pod-autoscaling-dklx7, replica count: 1
I0801 18:00:36.819542   31092 runners.go:177] rc-light-ctrl Pods: 1 out of 1 created, 1 running, 0 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
Aug  1 18:00:41.820: INFO: Waiting for amount of service:rc-light-ctrl endpoints to be 1
Aug  1 18:00:41.833: INFO: RC rc-light: consume 150 millicores in total
Aug  1 18:00:41.833: INFO: RC rc-light: setting consumption to 150 millicores in total
Aug  1 18:00:41.833: INFO: RC rc-light: consume 0 MB in total
Aug  1 18:00:41.833: INFO: RC rc-light: consume custom metric 0 in total
Aug  1 18:00:41.833: INFO: RC rc-light: setting bump of metric QPS to 0 in total
Aug  1 18:00:41.833: INFO: RC rc-light: sending request to consume 150 millicores
Aug  1 18:00:41.834: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug  1 18:00:41.834: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug  1 18:00:41.834: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug  1 18:00:41.834: INFO: RC rc-light: setting consumption to 0 MB in total
Aug  1 18:00:41.835: INFO: RC rc-light: sending request to consume 0 MB
Aug  1 18:00:41.835: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  1 18:00:41.865: INFO: waiting for 2 replicas (current: 1)
Aug  1 18:01:01.875: INFO: waiting for 2 replicas (current: 1)
Aug  1 18:01:11.871: INFO: RC rc-light: sending request to consume 0 MB
Aug  1 18:01:11.871: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  1 18:01:11.871: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug  1 18:01:11.871: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug  1 18:01:12.042: INFO: RC rc-light: sending request to consume 150 millicores
Aug  1 18:01:12.042: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug  1 18:01:21.870: INFO: waiting for 2 replicas (current: 1)
Aug  1 18:01:41.876: INFO: waiting for 2 replicas (current: 1)
Aug  1 18:01:41.894: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug  1 18:01:41.894: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug  1 18:01:41.895: INFO: RC rc-light: sending request to consume 0 MB
Aug  1 18:01:41.895: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  1 18:01:42.241: INFO: RC rc-light: sending request to consume 150 millicores
Aug  1 18:01:42.241: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug  1 18:02:01.871: INFO: waiting for 2 replicas (current: 1)
Aug  1 18:02:11.923: INFO: RC rc-light: sending request to consume 0 of custom metric QPS
Aug  1 18:02:11.923: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug  1 18:02:11.923: INFO: RC rc-light: sending request to consume 0 MB
Aug  1 18:02:11.923: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  1 18:02:12.442: INFO: RC rc-light: sending request to consume 150 millicores
Aug  1 18:02:12.442: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-dklx7/services/rc-light-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=150&requestSizeMillicores=20 }
Aug  1 18:02:21.878: INFO: waiting for 2 replicas (current: 1)
...

まずは HPA を有効にする方法を調べる。 → kubectl get hpa で確認可能。テスト環境は有効になっている模様。

公式サイト: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ 「HPA は Control Loop として実装され、その同期タイミングは Controller Manager の --horizontal-pod-autoscaller-sync-period フラグによって制御される」ここで言う Controller manager は kube-controller-manager Heapster も定期的に Metrics をチェックするようだが、それとは別物

「HPA は API リソースであり、autoscaling/v1 というAPI として定義されている。
　HPA の操作は kubectl で可能だ。kubectl create コマンドで HPA を作成し、
　kubectl get hpa で HPA のリストを取得できる。詳細は kubectl describe hpa で参照可能だ。
　また、kubectl delete hpa で削除も出来る」

問題が起きている環境で試したところ、No resources found が帰ってきたが、 API としては 200 で成功しているので HPA API 自体は存在する模様

$ kubectl get hpa -v=8
I0801 20:35:26.875828   28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.877550   28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.879167   28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.885239   28459 loader.go:359] Config loaded from file /home/ubuntu/admin.conf
I0801 20:35:26.886342   28459 round_trippers.go:383] GET https://192.168.1.108:6443/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers?limit=500
I0801 20:35:26.886372   28459 round_trippers.go:390] Request Headers:
I0801 20:35:26.886379   28459 round_trippers.go:393]     Accept: application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json
I0801 20:35:26.886385   28459 round_trippers.go:393]     User-Agent: kubectl/v1.11.1 (linux/amd64) kubernetes/b1b2997
I0801 20:35:26.899223   28459 round_trippers.go:408] Response Status: 200 OK in 12 milliseconds
I0801 20:35:26.899245   28459 round_trippers.go:411] Response Headers:
I0801 20:35:26.899256   28459 round_trippers.go:414]     Content-Type: application/json
I0801 20:35:26.899261   28459 round_trippers.go:414]     Content-Length: 2497
I0801 20:35:26.899265   28459 round_trippers.go:414]     Date: Wed, 01 Aug 2018 20:35:26 GMT
I0801 20:35:26.899464   28459 request.go:897] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1beta1","metadata":{"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers","resourceVersion":"1533485"},"columnDefinitions":[{"name":"Name","type":"string","format":"name","description":"Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/identifiers#names","priority":0},{"name":"Reference","type":"string","format":"","description":"scaleTargetRef points to the target resource to scale, and is used to the pods for which metrics should be collected, as well as to actually change the replica count.","priority":0},{"name":"Targets","type":"string","format":"","description":"metrics contains the specifications for which to use to calculate the desired replica count (the ma [truncated 1473 chars]
No resources found.

作業を通して検出したドキュメントの不備を修正: https://github.com/kubernetes/website/pull/9711

実際に HPA を試してみる。

nginx の deploymentを作成。Pod は1つ

$ kubectl run nginx --image=nginx
$ kubectl get pods
NAME                     READY     STATUS    RESTARTS   AGE
nginx-64f497f8fd-l2wnb   1/1       Running   0          21h
$

作成した Deployment の ReplicaSet を特定する → kubectl autoscale では deployment の直接指定も可能なので、この手順は不要かも
```
$ kubectl get rs
NAME               DESIRED   CURRENT   READY     AGE
nginx-64f497f8fd   1         1         1         21h
```

最小 Pod は 1、最大3で CPU 利用率が 80% を超えるとスケールするように設定

$ kubectl autoscale rs nginx-64f497f8fd --min=1 --max=3 --cpu-percent=80
horizontalpodautoscaler.autoscaling/nginx-64f497f8fd autoscaled

HPA の設定を確認 → unknown が気になる

$ kubectl get hpa
NAME               REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx-64f497f8fd   ReplicaSet/nginx-64f497f8fd   <unknown>/80%   1         3         1          4m

負荷対象を作るため nginx 用の service を作成

$ kubectl expose deployment nginx --port 80
service/nginx exposed
$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
nginx        ClusterIP   10.101.55.26   <none>        80/TCP    9s

service にアクセス可能なことを確認する → endpoint は個々の pod の URL なので、ここでは LB の役割を果たす service を負荷対象にする
```
$ curl http://10.101.55.26:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
```

httperf で負荷をかけてみる

$ sudo apt-get install httperf
$ httperf --hog --server 10.101.55.26 --port 80 --rate 300 --num-conn 30000 --num-call 5 --timeout 5

別コンソール → nginx の負荷がぜんぜん増えない・・

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15607 ubuntu    20   0   16980   3040   2552 R  98.0  0.1   0:18.15 httperf
10781 systemd+  20   0   33076   2968   1712 S  16.0  0.1   0:02.96 nginx

HPA の CPU 設定を50％に変更、rate を10倍にする

$ kubectl delete hpa nginx-64f497f8fd
$ kubectl autoscale deployment nginx --min=1 --max=3 --cpu-percent=50
$ httperf --hog --server 10.101.55.26 --port 80 --rate 3000 --num-conn 120000 --num-call 5 --timeout 5

別コンソール

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
18466 ubuntu    20   0   18432   4324   2524 R  76.1  0.1   0:32.64 httperf
10781 systemd+  20   0   33076   3292   1904 R  63.1  0.1   1:12.29 nginx

120秒以上 CPU 利用率は60%を超えていたがスケールせず。

Pod での CPU 利用率が取れていない？

unknown の原因調査 → API的には rows の cells 配列で unknown の文字列が kube-apiserver から返ってくる

$ kubectl get hpa -v=16
...
I0801 22:25:54.290584     379 request.go:897] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1beta1",
"metadata":{"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers","resourceVersion":"1542106"},
"columnDefinitions":[
...
"rows":[
  {"cells":[
       "nginx",
       "Deployment/nginx",
       "\u003cunknown\u003e/50%",
       "1",
       3,
       1,
       "46m"],
   "object":{"kind":"PartialObjectMetadata","apiVersion":"meta.k8s.io/v1beta1",
       "metadata":{
           "name":"nginx",
           "namespace":"default",
           "selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx",
           "uid":"6c2bae03-95d3-11e8-ae67-fa163e420595",
           "resourceVersion":"1538551",
           "creationTimestamp":"2018-08-01T21:39:52Z"}}}]}
I0801 22:25:54.291141     379 get.go:443] no kind is registered for the type v1beta1.Table
NAME      REFERENCE          TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx     Deployment/nginx   <unknown>/50%   1         3         1          46m

「Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.」が unknown の原因らしい。 deployment 作成時に適切な resource request set を設定する必要があるらしい。

kubectl の help に関連フラグの説明があった

      --requests='': The resource requirement requests for this container.  For example, 'cpu=100m,memory=256Mi'.  Note that server side components may assign requests depending on the server configuration, such as limit ranges.

フラグ --requests をつけて、試してみる。

まずは cleanup

$ kubectl delete hpa nginx
$ kubectl delete service nginx
$ kubectl delete deployment nginx

各設定を実施

$ kubectl run nginx --image=nginx --requests='cpu=100m,memory=256Mi'
$ kubectl expose deployment nginx --port 80
$ kubectl autoscale deployment nginx --min=1 --max=3 --cpu-percent=50

相変わらず、unknown のまま

$ kubectl get hpa
NAME      REFERENCE          TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx     Deployment/nginx   <unknown>/50%   1         3         0          7s

deployment 的には resource request が設定できている

$ kubectl describe deployment nginx
...
Pod Template:
  Labels:  run=nginx
  Containers:
   nginx:
    Image:      nginx
    Port:       <none>
    Host Port:  <none>
    Requests:
      cpu:        100m
      memory:     256Mi

/var/log/containers/kube-controller-manager-k8s-master_kube-system_kube-controller-manager-ecad4d8d03053b6ce0626f9ca60fec563f8b11a34e2b7446c5b40343fedf6a70.log に関連ログが出ていた。

{"log":"I0801 23:49:34.858775       1 event.go:221] Event(v1.ObjectReference{
     Kind:\"HorizontalPodAutoscaler\",
     Namespace:\"default\",
     Name:\"nginx\",
     UID:\"cba19f8b-95e3-11e8-ae67-fa163e420595\",
     APIVersion:\"autoscaling/v2beta1\",
     ResourceVersion:\"1547695\", FieldPath:\"\"}):
  type: 'Warning'
  reason: 'FailedComputeMetricsReplicas' failed to get cpu utilization: unable to get metrics for
   resource cpu: unable to fetch metrics from resource metrics API: the server could not find
   the requested resource (get pods.metrics.k8s.io)\n",
   "stream":"stderr",
   "time":"2018-08-01T23:52:35.04907021Z"}
pods.metrics.k8s.io)\n","stream":"stderr","time":"2018-08-01T23:49:34.859114171Z"}

システム再起動すると直るという話もあったので再起動してみた → だめだ、引き続き上記のログが出力され、unknown が表示される。

Pod の CPU利用率 Metrics を取得する機能を調査する。

kubectl top なるものがあるが、heapster が無いためか実行できない

$ kubectl top pod -v=8
..
I0802 01:02:00.135619   23579 round_trippers.go:383] GET https://192.168.1.108:6443/api/v1/namespaces/kube-system/services/http:heapster:/proxy/apis/metrics/v1alpha1/nodes?labelSelector=
..
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

上記API Call で 404 が返っている。ログのエラーメッセージからエラー箇所を特定 controller/podautoscaler/replica_calculator.go

 54 // GetResourceReplicas calculates the desired replica count based on a target resource utilization percentage
 55 // of the given resource for pods matching the given selector in the given namespace, and the current replica count
 56 func (c *ReplicaCalculator) GetResourceReplicas(currentReplicas int32, targetUtilization int32, resource v1.ResourceName, namespace string, selector labe    ls.Selector) (replicaCount int32, utilization int32, rawUtilization int64, timestamp time.Time, err error) {
 57         metrics, timestamp, err := c.metricsClient.GetResourceMetric(resource, namespace, selector)
 58         if err != nil {
 59                 return 0, 0, 0, time.Time{}, fmt.Errorf("unable to get metrics for resource %s: %v", resource, err)
 60         }

k/k issue #57996 で

The error (get pods.metrics.k8s.io) is due to metric server, as you are not running metric server. In Kubernetes 1.9+, HPA needs metric server to fetch the CPU usage.

とのコメントあり。 https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/#metrics-server からすると別途 Metrics Server の起動が必要そう・・もしそうなら metrics-server が kubernetes-incubator にあるの変じゃないか・・ HPA はAPIとして Stable なのに、その構成要素が incubator なんて・・

一先ず、言われたとおりにmetrics-server をデプロイしてみる

$ git clone https://github.com/kubernetes-incubator/metrics-server
$ cd metrics-server/
$ kubectl create -f deploy/1.8+/
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.extensions/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

状況は変わらず、再起動してみる。 → だめだった。 nginx のデプロイから再度やってみる。 → 状況同じ

metrics-server がエラーを出していることが判明、原因を調査する。

$ kubectl log metrics-server-86bd9d7667-ltlpz -n kube-system
...
I0802 01:38:30.801336       1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-5lqpt
I0802 01:38:30.801429       1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-5lqpt
E0802 01:39:05.015074       1 summary.go:97] error while getting metrics summary from Kubelet k8s-master(192.168.1.108:10255): Get http://192.168.1.108:10255/stats/summary/: dial tcp 192.168.1.108:10255: getsockopt: connection refused
E0802 01:39:05.015334       1 summary.go:97] error while getting metrics summary from Kubelet k8s-node01(192.168.1.109:10255): Get http://192.168.1.109:10255/stats/summary/: dial tcp 192.168.1.109:10255: getsockopt: connection refused
E0802 01:40:05.009938       1 summary.go:97] error while getting metrics summary from Kubelet k8s-node01(192.168.1.109:10255): Get http://192.168.1.109:10255/stats/summary/: dial tcp 192.168.1.109:10255: getsockopt: connection refused
E0802 01:40:05.017074       1 summary.go:97] error while getting metrics summary from Kubelet k8s-master(192.168.1.108:10255): Get http://192.168.1.108:10255/stats/summary/: dial tcp 192.168.1.108:10255: getsockopt: connection refused
I0802 01:40:17.567445       1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 01:40:17.567539       1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
I0802 01:40:47.621436       1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 01:40:47.621522       1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
E0802 01:41:05.004503       1 summary.go:97] error while getting metrics summary from Kubelet k8s-node01(192.168.1.109:10255): Get http://192.168.1.109:10255/stats/summary/: dial tcp 192.168.1.109:10255: getsockopt: connection refused
E0802 01:41:05.021225       1 summary.go:97] error while getting metrics summary from Kubelet k8s-master(192.168.1.108:10255): Get http://192.168.1.108:10255/stats/summary/: dial tcp 192.168.1.108:10255: getsockopt: connection refused
I0802 01:41:17.655429       1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 01:41:17.655552       1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr

kubelet の設定を変える必要がありそう (kubespray/issues/2550)

This is due to the fact that the kublet read only port was disabled by default.
You can enable the read-only port in all.yml with: kube_read_only_port: 10255

設定箇所 cmd/kubelet/app/options/options.go

452         fs.Int32Var(&c.ReadOnlyPort, "read-only-port", c.ReadOnlyPort, "The read-only port for the Kubelet to serve on with no authentication/authorization (set to 0 to disable)")

kubelet に read-only-port オプションを指定する方法を調査

既存 kubelet プロセスの起動オプション

root      1002     1  8 01:34 ?        01:19:57 /usr/bin/kubelet
 --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
 --kubeconfig=/etc/kubernetes/kubelet.conf
 --config=/var/lib/kubelet/config.yaml
 --cgroup-driver=cgroupfs
 --cni-bin-dir=/opt/cni/bin
 --cni-conf-dir=/etc/cni/net.d
 --network-plugin=cni

config.yaml を以下のとおり変更し、システム再起動

--- /var/lib/kubelet/config.yaml.orig    2018-08-02 16:57:23.865340698 +0000
+++ /var/lib/kubelet/config.yaml 2018-08-02 17:14:29.502421376 +0000
@@ -57,6 +57,7 @@
 oomScoreAdj: -999
 podPidsLimit: -1
 port: 10250
+readOnlyPort: 10255
 registryBurst: 10
 registryPullQPS: 5
 resolvConf: /etc/resolv.conf

再起動後、Listen ポートを確認 → 10255 が Listen されていることを確認

$ sudo netstat -anp | grep 102 | grep kubelet
tcp        0      0 127.0.0.1:35881         0.0.0.0:*               LISTEN      1020/kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      1020/kubelet
tcp6       0      0 :::10250                :::*                    LISTEN      1020/kubelet
tcp6       0      0 :::10255                :::*                    LISTEN      1020/kubelet

metrics-serverのログには接続失敗は表示されなくなったが、依然 Metrics が存在しないという課題あり

I0802 17:19:17.085093       1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 17:19:17.085209       1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr
I0802 17:19:47.202876       1 reststorage.go:140] No metrics for container nginx in pod default/nginx-5b669f8784-9m2tr
I0802 17:19:47.202939       1 reststorage.go:93] No metrics for pod default/nginx-5b669f8784-9m2tr

node01 の設定をしていないため？

$ kubectl get pod -o=wide
NAME                     READY     STATUS    RESTARTS   AGE       IP           NODE
nginx-5b669f8784-9m2tr   1/1       Running   0          15h       10.244.1.6   k8s-node01

node01 にも設定したところ、metrics-server はエラーを出さず unknown も解消した

$ kubectl get hpa
NAME      REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx     Deployment/nginx   0%/50%    1         3         1          15h

問題となっていた e2e テストを流してみる → 通るようになった

$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=1\spod\sto\s2\spods"
..
~ [SLOW TEST:133.923 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
  [sig-autoscaling] ReplicationController light
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
    Should scale from 1 pod to 2 pods
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:70
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug  2 17:33:13.282: INFO: Running AfterSuite actions on all node
Aug  2 17:33:13.282: INFO: Running AfterSuite actions on node 1

Ran 1 of 999 Specs in 134.000 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 998 Skipped PASS

Ginkgo ran 1 suite in 2m14.312258357s
Test Suite Passed
2018/08/02 17:33:13 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=1\spod\sto\s2\spods' finished in 2m14.533305342s
2018/08/02 17:33:13 e2e.go:83: Done

kubectl top も動作するようになった

$ kubectl top pod
NAME                     CPU(cores)   MEMORY(bytes)
nginx-5b669f8784-9m2tr   0m           3Mi

本件に関わる kubernetes そのものの改善点

HPA ドキュメントにおける Metircs-server 設定必要性の追記: website/pull/9729 として投稿済み
e2e HPA テストにおいて Metircs-server が設定されていない問題の事前検知と設定促し

だめだ、未だ8件中1件しかテスト成功しない

~ Failure [960.002 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
  [sig-autoscaling] ReplicationController light
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
    Should scale from 2 pods to 1 pod [It]
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:82

    timeout waiting 15m0s for 1 replicas
    Expected error:
        <*errors.errorString | 0xc420085540>: {
            s: "timed out waiting for the condition",
        }
        timed out waiting for the condition
    not to have occurred

    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug  2 20:34:37.432: INFO: Running AfterSuite actions on all node
Aug  2 20:34:37.432: INFO: Running AfterSuite actions on node 1

Summarizing 7 Failures:

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 1 pod to 3 pods and from 3 to 5
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 2 pods to 1 pod
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

Ran 8 of 999 Specs in 6821.750 seconds
FAIL! -- 1 Passed | 7 Failed | 0 Pending | 991 Skipped --- FAIL: TestE2E (6821.77s)
FAIL

Ginkgo ran 1 suite in 1h53m42.043036632s
Test Suite Failed
!!! Error in ./hack/ginkgo-e2e.sh:143
  Error in ./hack/ginkgo-e2e.sh:143. '"${ginkgo}" "${ginkgo_args[@]:+${ginkgo_args[@]}}" "${e2e_test}" -- "${auth_config[@]:+${auth_config[@]}}" --ginkgo.flakeAttempts="${FLAKE_ATTEMPTS}" --host="${KUBE_MASTER_URL}" --provider="${KUBERNETES_PROVIDER}" --gce-project="${PROJECT:-}" --gce-zone="${ZONE:-}" --gce-region="${REGION:-}" --gce-multizone="${MULTIZONE:-false}" --gke-cluster="${CLUSTER_NAME:-}" --kube-master="${KUBE_MASTER:-}" --cluster-tag="${CLUSTER_ID:-}" --cloud-config-file="${CLOUD_CONFIG:-}" --repo-root="${KUBE_ROOT}" --node-instance-group="${NODE_INSTANCE_GROUP:-}" --prefix="${KUBE_GCE_INSTANCE_PREFIX:-e2e}" --network="${KUBE_GCE_NETWORK:-${KUBE_GKE_NETWORK:-e2e}}" --node-tag="${NODE_TAG:-}" --master-tag="${MASTER_TAG:-}" --cluster-monitoring-mode="${KUBE_ENABLE_CLUSTER_MONITORING:-standalone}" --prometheus-monitoring="${KUBE_ENABLE_PROMETHEUS_MONITORING:-false}" ${KUBE_CONTAINER_RUNTIME:+"--container-runtime=${KUBE_CONTAINER_RUNTIME}"} ${MASTER_OS_DISTRIBUTION:+"--master-os-distro=${MASTER_OS_DISTRIBUTION}"} ${NODE_OS_DISTRIBUTION:+"--node-os-distro=${NODE_OS_DISTRIBUTION}"} ${NUM_NODES:+"--num-nodes=${NUM_NODES}"} ${E2E_REPORT_DIR:+"--report-dir=${E2E_REPORT_DIR}"} ${E2E_REPORT_PREFIX:+"--report-prefix=${E2E_REPORT_PREFIX}"} "${@:-}"' exited with status 1
Call stack:
  1: ./hack/ginkgo-e2e.sh:143 main(...)
Exiting with status 1
2018/08/02 20:34:37 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[HPA\]' finished in 1h53m42.253120653s
2018/08/02 20:34:37 main.go:309: Something went wrong: encountered 1 errors: [error during ./hack/ginkgo-e2e.sh --ginkgo.focus=\[HPA\]: exit status 1]
2018/08/02 20:34:37 e2e.go:81: err: exit status 1
exit status 1

失敗したテストを見ると、Scale-down を含むもの？最もシンプルな「Should scale from 2 pods to 1 pod」を単体で実行してみる

$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=2\spods\sto\s1\spod"
...
Aug  2 20:54:43.663: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 ht-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMet
Aug  2 20:54:43.665: INFO: RC rc-light: sending request to consume 0 MB
Aug  2 20:54:43.665: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/nroxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  2 20:54:50.470: INFO: RC rc-light: sending request to consume 50 millicores
Aug  2 20:54:50.470: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/nroxy/ConsumeCPU  false durationSec=30&millicores=50&requestSizeMillicores=20 }
Aug  2 20:54:53.038: INFO: waiting for 1 replicas (current: 2)
Aug  2 20:55:13.041: INFO: waiting for 1 replicas (current: 2)
Aug  2 20:55:13.692: INFO: RC rc-light: sending request to consume 0 of custom m
Aug  2 20:55:13.692: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 ht-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMet
Aug  2 20:55:13.696: INFO: RC rc-light: sending request to consume 0 MB
Aug  2 20:55:13.696: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/nroxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  2 20:55:20.546: INFO: RC rc-light: sending request to consume 50 millicores
Aug  2 20:55:20.546: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/nroxy/ConsumeCPU  false durationSec=30&millicores=50&requestSizeMillicores=20 }
Aug  2 20:55:33.043: INFO: waiting for 1 replicas (current: 2)
Aug  2 20:55:43.700: INFO: RC rc-light: sending request to consume 0 of custom m
Aug  2 20:55:43.700: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 ht-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMet
Aug  2 20:55:43.703: INFO: RC rc-light: sending request to consume 0 MB
Aug  2 20:55:43.703: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/nroxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
^[[AAug  2 20:55:50.630: INFO: RC rc-light: sending request to consume 50 millic
Aug  2 20:55:50.630: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/nroxy/ConsumeCPU  false durationSec=30&millicores=50&requestSizeMillicores=20 }
Aug  2 20:55:53.044: INFO: waiting for 1 replicas (current: 2)

Aug  2 20:56:13.042: INFO: waiting for 1 replicas (current: 2)
Aug  2 20:56:13.053: INFO: waiting for 1 replicas (current: 2)
Aug  2 20:56:13.053: INFO: Unexpected error occurred: timed out waiting for the condition
STEP: Removing consuming RC rc-light
Aug  2 20:56:13.076: INFO: RC rc-light: stopping CPU consumer
Aug  2 20:56:13.076: INFO: RC rc-light: stopping metric consumer
Aug  2 20:56:13.076: INFO: RC rc-light: stopping mem consumer
STEP: deleting { ReplicationController} rc-light in namespace e2e-tests-horizontal-pod-autoscaling-gk25p, will wait for the garbage collector to delete the pods
Aug  2 20:56:23.165: INFO: Deleting { ReplicationController} rc-light took: 27.919684ms
Aug  2 20:56:23.266: INFO: Terminating { ReplicationController} rc-light pods took: 100.554288ms
STEP: deleting { ReplicationController} rc-light-ctrl in namespace e2e-tests-horizontal-pod-autoscaling-gk25p, will wait for the garbage collector to delete the pods
Aug  2 20:56:31.089: INFO: Deleting { ReplicationController} rc-light-ctrl took: 7.49459ms
Aug  2 20:56:31.190: INFO: Terminating { ReplicationController} rc-light-ctrl pods took: 100.487127ms
[AfterEach] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:142
STEP: Collecting events from namespace "e2e-tests-horizontal-pod-autoscaling-gk25p".
STEP: Found 20 events.
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light: {replication-controller } SuccessfulCreate: Created pod: rc-light-s8q7k
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light: {replication-controller } SuccessfulCreate: Created pod: rc-light-8r7zs
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light-8r7zs: {default-scheduler } Scheduled: Successfully assigned e2e-tests-horizontal-pod-autoscaling-gk25p/rc-light-8r7zs to k8s-node01
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:47 +0000 UTC - event for rc-light-s8q7k: {default-scheduler } Scheduled: Successfully assigned e2e-tests-horizontal-pod-autoscaling-gk25p/rc-light-s8q7k to k8s-master
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:49 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Pulled: Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3" already present on machine
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:49 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Created: Created container
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Started: Started container
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Started: Started container
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Created: Created container
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:50 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Pulled: Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3" already present on machine
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:57 +0000 UTC - event for rc-light-ctrl: {replication-controller } SuccessfulCreate: Created pod: rc-light-ctrl-d76tm
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:40:57 +0000 UTC - event for rc-light-ctrl-d76tm: {default-scheduler } Scheduled: Successfully assigned e2e-tests-horizontal-pod-autoscaling-gk25p/rc-light-ctrl-d76tm to k8s-node01
Aug  2 20:56:33.178: INFO: At 2018-08-02 20:41:00 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Started: Started container
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:41:00 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Created: Created container
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:41:00 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Pulled: Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0" already present on machine
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:41:43 +0000 UTC - event for rc-light: {horizontal-pod-autoscaler } FailedGetResourceMetric: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:41:43 +0000 UTC - event for rc-light: {horizontal-pod-autoscaler } FailedComputeMetricsReplicas: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:56:23 +0000 UTC - event for rc-light-8r7zs: {kubelet k8s-node01} Killing: Killing container with id docker://rc-light:Need to kill Pod
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:56:23 +0000 UTC - event for rc-light-s8q7k: {kubelet k8s-master} Killing: Killing container with id docker://rc-light:Need to kill Pod
Aug  2 20:56:33.179: INFO: At 2018-08-02 20:56:31 +0000 UTC - event for rc-light-ctrl-d76tm: {kubelet k8s-node01} Killing: Killing container with id docker://rc-light-ctrl:Need to kill Pod
Aug  2 20:56:33.191: INFO: POD                                 NODE        PHASE    GRACE  CONDITIONS
Aug  2 20:56:33.191: INFO: nginx-5b669f8784-9m2tr              k8s-node01  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:39:40 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:48 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:39:40 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: coredns-78fcdf6894-gt7zj            k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:22 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: coredns-78fcdf6894-tk6ch            k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:22 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:52 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: etcd-k8s-master                     k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-apiserver-k8s-master           k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-controller-manager-k8s-master  k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-flannel-ds-7df6r               k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:12:09 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:18 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-17 17:12:31 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-flannel-ds-kl4q4               k8s-node01  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:01:09 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:43 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:01:01 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-proxy-hxp7z                    k8s-node01  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:51 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:42 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:51 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-proxy-zwrl4                    k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:37 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:17 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-07-31 23:08:37 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: kube-scheduler-k8s-master           k8s-master  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:05 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:17:04 +0000 UTC  }]
Aug  2 20:56:33.191: INFO: metrics-server-86bd9d7667-ltlpz     k8s-node01  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:28:14 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 17:23:49 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2018-08-02 01:28:14 +0000 UTC  }]
Aug  2 20:56:33.191: INFO:
Aug  2 20:56:33.195: INFO:
Logging node info for node k8s-master
Aug  2 20:56:33.199: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:k8s-master,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/k8s-master,UID:94f19db7-89e3-11e8-b234-fa163e420595,ResourceVersion:1648390,Generation:0,CreationTimestamp:2018-07-17 17:05:18 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: k8s-master,node-role.kubernetes.io/master: ,},Annotations:map[string]string{flannel.alpha.coreos.com/backend-data: {"VtepMAC":"1a:9d:81:1e:9d:0f"},flannel.alpha.coreos.com/backend-type: vxlan,flannel.alpha.coreos.com/kube-subnet-manager: true,flannel.alpha.coreos.com/public-ip: 192.168.1.108,kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock,node.alpha.kubernetes.io/ttl: 0,volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NodeSpec{PodCIDR:10.244.0.0/24,DoNotUse_ExternalID:,ProviderID:,Unschedulable:false,Taints:[],ConfigSource:nil,},Status:NodeStatus{Capacity:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{41567956992 0} {<nil>} 40593708Ki BinarySI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4143394816 0} {<nil>} 4046284Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Allocatable:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{37411161231 0} {<nil>} 37411161231 DecimalSI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4038537216 0} {<nil>} 3943884Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2018-08-02 20:56:31 +0000 UTC 2018-07-17 17:05:14 +0000 UTC KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2018-08-02 20:56:31 +0000 UTC 2018-07-31 23:04:27 +0000 UTC KubeletReady kubelet is posting ready status. AppArmor enabled}],Addresses:[{InternalIP 192.168.1.108} {Hostname k8s-master}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:1db2c06c39a54cd3a93a4e0a44823fd6,SystemUUID:1DB2C06C-39A5-4CD3-A93A-4E0A44823FD6,BootID:b14bfb61-a0cc-45f4-8b29-42ee08c00ac6,KernelVersion:4.4.0-130-generic,OSImage:Ubuntu 16.04.5 LTS,ContainerRuntimeVersion:docker://1.11.2,KubeletVersion:v1.11.1,KubeProxyVersion:v1.11.1,OperatingSystem:linux,Architecture:amd64,},Images:[{[golang:1.10] 793901893} {[k8s.gcr.io/etcd-amd64:3.2.18] 218904307} {[k8s.gcr.io/kube-apiserver-amd64:v1.11.1] 186675825} {[k8s.gcr.io/kube-apiserver-amd64:v1.11.0] 186617744} {[k8s.gcr.io/kube-controller-manager-amd64:v1.11.1] 155252555} {[k8s.gcr.io/kube-controller-manager-amd64:v1.11.0] 155203118} {[nginx:latest] 108975101} {[k8s.gcr.io/kube-proxy-amd64:v1.11.1] 97776424} {[k8s.gcr.io/kube-proxy-amd64:v1.11.0] 97772373} {[k8s.gcr.io/kube-scheduler-amd64:v1.11.1] 56781436} {[k8s.gcr.io/kube-scheduler-amd64:v1.11.0] 56757023} {[quay.io/coreos/flannel:v0.9.1-amd64] 51338831} {[gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3] 49707607} {[k8s.gcr.io/coredns:1.1.3] 45587362} {[gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0] 5902947} {[k8s.gcr.io/pause-amd64:3.1 k8s.gcr.io/pause:3.1] 742472}],VolumesInUse:[],VolumesAttached:[],Config:nil,},}
Aug  2 20:56:33.199: INFO:
Logging kubelet events for node k8s-master
Aug  2 20:56:33.202: INFO:
Logging pods the kubelet thinks is on node k8s-master
Aug  2 20:56:33.211: INFO: kube-apiserver-k8s-master started at <nil> (0+0 container statuses recorded)
Aug  2 20:56:33.211: INFO: coredns-78fcdf6894-gt7zj started at 2018-07-17 17:12:52 +0000 UTC (0+1 container statuses recorded)
Aug  2 20:56:33.212: INFO:      Container coredns ready: true, restart count 5
Aug  2 20:56:33.212: INFO: coredns-78fcdf6894-tk6ch started at 2018-07-17 17:12:52 +0000 UTC (0+1 container statuses recorded)
Aug  2 20:56:33.212: INFO:      Container coredns ready: true, restart count 5
Aug  2 20:56:33.213: INFO: kube-flannel-ds-7df6r started at 2018-07-17 17:12:31 +0000 UTC (1+1 container statuses recorded)
Aug  2 20:56:33.213: INFO:      Init container install-cni ready: true, restart count 5
Aug  2 20:56:33.213: INFO:      Container kube-flannel ready: true, restart count 5
Aug  2 20:56:33.213: INFO: kube-scheduler-k8s-master started at <nil> (0+0 container statuses recorded)
Aug  2 20:56:33.214: INFO: etcd-k8s-master started at <nil> (0+0 container statuses recorded)
Aug  2 20:56:33.214: INFO: kube-proxy-zwrl4 started at 2018-07-31 23:08:37 +0000 UTC (0+1 container statuses recorded)
Aug  2 20:56:33.214: INFO:      Container kube-proxy ready: true, restart count 5
Aug  2 20:56:33.214: INFO: kube-controller-manager-k8s-master started at <nil> (0+0 container statuses recorded)
Aug  2 20:56:33.243: INFO:
Latency metrics for node k8s-master
Aug  2 20:56:33.244: INFO:
Logging node info for node k8s-node01
Aug  2 20:56:33.247: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:k8s-node01,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/k8s-node01,UID:980d8d67-9515-11e8-a804-fa163e420595,ResourceVersion:1648376,Generation:0,CreationTimestamp:2018-07-31 23:01:01 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: k8s-node01,},Annotations:map[string]string{flannel.alpha.coreos.com/backend-data: {"VtepMAC":"a6:73:46:9f:a3:63"},flannel.alpha.coreos.com/backend-type: vxlan,flannel.alpha.coreos.com/kube-subnet-manager: true,flannel.alpha.coreos.com/public-ip: 192.168.1.109,kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock,node.alpha.kubernetes.io/ttl: 0,volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NodeSpec{PodCIDR:10.244.1.0/24,DoNotUse_ExternalID:,ProviderID:,Unschedulable:false,Taints:[],ConfigSource:nil,},Status:NodeStatus{Capacity:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{41567956992 0} {<nil>} 40593708Ki BinarySI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4143394816 0} {<nil>} 4046284Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Allocatable:ResourceList{cpu: {{2 0} {<nil>} 2 DecimalSI},ephemeral-storage: {{37411161231 0} {<nil>} 37411161231 DecimalSI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{4038537216 0} {<nil>} 3943884Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure} {PIDPressure False 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:01 +0000 UTC KubeletHasSufficientPID kubelet has sufficient PID available} {Ready True 2018-08-02 20:56:28 +0000 UTC 2018-07-31 23:01:11 +0000 UTC KubeletReady kubelet is posting ready status. AppArmor enabled}],Addresses:[{InternalIP 192.168.1.109} {Hostname k8s-node01}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:817a385b9de241668e47cd87cda24f47,SystemUUID:817A385B-9DE2-4166-8E47-CD87CDA24F47,BootID:f47e3b62-dd98-42c4-91e2-9b657f85dfd3,KernelVersion:4.4.0-130-generic,OSImage:Ubuntu 16.04.4 LTS,ContainerRuntimeVersion:docker://1.11.2,KubeletVersion:v1.11.1,KubeProxyVersion:v1.11.1,OperatingSystem:linux,Architecture:amd64,},Images:[{[nginx:latest] 108975101} {[k8s.gcr.io/kube-proxy-amd64:v1.11.1] 97776424} {[k8s.gcr.io/kube-proxy-amd64:v1.11.0] 97772373} {[quay.io/coreos/flannel:v0.9.1-amd64] 51338831} {[gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3] 49707607} {[gcr.io/google_containers/metrics-server-amd64:v0.2.1] 42541759} {[gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0] 5902947} {[k8s.gcr.io/pause:3.1] 742472}],VolumesInUse:[],VolumesAttached:[],Config:nil,},}
Aug  2 20:56:33.247: INFO:
Logging kubelet events for node k8s-node01
Aug  2 20:56:33.250: INFO:
Logging pods the kubelet thinks is on node k8s-node01
Aug  2 20:56:33.266: INFO: nginx-5b669f8784-9m2tr started at 2018-08-02 01:39:40 +0000 UTC (0+1 container statuses recorded)
Aug  2 20:56:33.266: INFO:      Container nginx ready: true, restart count 1
Aug  2 20:56:33.267: INFO: kube-flannel-ds-kl4q4 started at 2018-07-31 23:01:01 +0000 UTC (1+1 container statuses recorded)
Aug  2 20:56:33.267: INFO:      Init container install-cni ready: true, restart count 1
Aug  2 20:56:33.267: INFO:      Container kube-flannel ready: true, restart count 2
Aug  2 20:56:33.267: INFO: kube-proxy-hxp7z started at 2018-07-31 23:08:51 +0000 UTC (0+1 container statuses recorded)
Aug  2 20:56:33.268: INFO:      Container kube-proxy ready: true, restart count 1
Aug  2 20:56:33.268: INFO: metrics-server-86bd9d7667-ltlpz started at 2018-08-02 01:28:14 +0000 UTC (0+1 container statuses recorded)
Aug  2 20:56:33.268: INFO:      Container metrics-server ready: true, restart count 1
Aug  2 20:56:33.345: INFO:
Latency metrics for node k8s-node01
STEP: Dumping a list of prepulled images on each node...
Aug  2 20:56:33.363: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-horizontal-pod-autoscaling-gk25p" for this suite.

Aug  2 20:56:39.438: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Aug  2 20:56:39.533: INFO: namespace: e2e-tests-horizontal-pod-autoscaling-gk25p, resource: bindings, ignored listing per whitelist
Aug  2 20:56:39.577: INFO: namespace e2e-tests-horizontal-pod-autoscaling-gk25p deletion completed in 6.201374321s

~ Failure [951.844 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
  [sig-autoscaling] ReplicationController light
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
    Should scale from 2 pods to 1 pod [It]
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:82

    timeout waiting 15m0s for 1 replicas
    Expected error:
        <*errors.errorString | 0xc4201050d0>: {
            s: "timed out waiting for the condition",
        }
        timed out waiting for the condition
    not to have occurred

    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug  2 20:56:39.588: INFO: Running AfterSuite actions on all node
Aug  2 20:56:39.588: INFO: Running AfterSuite actions on node 1

Summarizing 1 Failure:

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light [It] Should scale from 2 pods to 1 pod
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:123

Ran 1 of 999 Specs in 951.939 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 998 Skipped --- FAIL: TestE2E (951.96s)
FAIL

Ginkgo ran 1 suite in 15m52.197860762s
Test Suite Failed
!!! Error in ./hack/ginkgo-e2e.sh:143
  Error in ./hack/ginkgo-e2e.sh:143. '"${ginkgo}" "${ginkgo_args[@]:+${ginkgo_args[@]}}" "${e2e_test}" -- "${auth_config[@]:+${auth_config[@]}}" --ginkgo.flakeAttempts="${FLAKE_ATTEMPTS}" --host="${KUBE_MASTER_URL}" --provider="${KUBERNETES_PROVIDER}" --gce-project="${PROJECT:-}" --gce-zone="${ZONE:-}" --gce-region="${REGION:-}" --gce-multizone="${MULTIZONE:-false}" --gke-cluster="${CLUSTER_NAME:-}" --kube-master="${KUBE_MASTER:-}" --cluster-tag="${CLUSTER_ID:-}" --cloud-config-file="${CLOUD_CONFIG:-}" --repo-root="${KUBE_ROOT}" --node-instance-group="${NODE_INSTANCE_GROUP:-}" --prefix="${KUBE_GCE_INSTANCE_PREFIX:-e2e}" --network="${KUBE_GCE_NETWORK:-${KUBE_GKE_NETWORK:-e2e}}" --node-tag="${NODE_TAG:-}" --master-tag="${MASTER_TAG:-}" --cluster-monitoring-mode="${KUBE_ENABLE_CLUSTER_MONITORING:-standalone}" --prometheus-monitoring="${KUBE_ENABLE_PROMETHEUS_MONITORING:-false}" ${KUBE_CONTAINER_RUNTIME:+"--container-runtime=${KUBE_CONTAINER_RUNTIME}"} ${MASTER_OS_DISTRIBUTION:+"--master-os-distro=${MASTER_OS_DISTRIBUTION}"} ${NODE_OS_DISTRIBUTION:+"--node-os-distro=${NODE_OS_DISTRIBUTION}"} ${NUM_NODES:+"--num-nodes=${NUM_NODES}"} ${E2E_REPORT_DIR:+"--report-dir=${E2E_REPORT_DIR}"} ${E2E_REPORT_PREFIX:+"--report-prefix=${E2E_REPORT_PREFIX}"} "${@:-}"' exited with status 1
Call stack:
  1: ./hack/ginkgo-e2e.sh:143 main(...)
Exiting with status 1
2018/08/02 20:56:39 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=2\spods\sto\s1\spod' finished in 15m52.41101064s
2018/08/02 20:56:39 main.go:309: Something went wrong: encountered 1 errors: [error during ./hack/ginkgo-e2e.sh --ginkgo.focus=2\spods\sto\s1\spod: exit status 1]
2018/08/02 20:56:39 e2e.go:81: err: exit status 1
exit status 1

テスト実行中に HPA の状態を見ると Unknown になっている

$ kubectl get hpa --all-namespaces
NAMESPACE                                    NAME       REFERENCE                        TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-6j9j8   rc-light   ReplicationController/rc-light   <unknown>/50%   1         2         0          18s

テスト途中でとれるようになった。

$ kubectl get hpa --all-namespaces
NAMESPACE                                    NAME       REFERENCE                        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-6j9j8   rc-light   ReplicationController/rc-light   52%/50%   1         2         2          2m

スケールダウンするか微妙な CPU 使用率になっている

$ kubectl get hpa --all-namespaces
NAMESPACE                                    NAME       REFERENCE                        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-6j9j8   rc-light   ReplicationController/rc-light   49%/50%   1         2         2          12m

HPA 詳細

$ kubectl describe hpa rc-light -n e2e-tests-horizontal-pod-autoscaling-6j9j8
Name:                                                  rc-light
Namespace:                                             e2e-tests-horizontal-pod-autoscaling-6j9j8
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Thu, 02 Aug 2018 20:59:22 +0000
Reference:                                             ReplicationController/rc-light
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  52% (105m) / 50%
Min replicas:                                          1
Max replicas:                                          2
ReplicationController pods:                            2 current / 2 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    the last scale time was sufficiently old as to warrant a new scale
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedGetResourceMetric       3m    horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  3m    horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

e2e テストにおける CPU 負荷のかけ方を調べてみる要約

テストケースの targetCPUUtilizationPercent は HPA の Target 要素であり、スケールの閾値
テストケースの totalInitialCPUUsage は負荷プログラムを走らせたときの初期の CPU 使用率
下記テストは 50% 閾値に対して負荷プログラムが 50% の CPU使用率で走ったときにPod 数が2から1にスケールダウンすることを確認するテスト

しかし、閾値と同じ CPU 使用率のときにスケールダウンするかは微妙・・ HPA のスケールダウンの振る舞いを確認する必要あり

82                 It("Should scale from 2 pods to 1 pod", func() {  #これが問題のテスト
83                         scaleTest := &HPAScaleTest{
84                                 initPods:                    2,
85                                 totalInitialCPUUsage:        50,
86                                 perPodCPURequest:            200,
87                                 targetCPUUtilizationPercent: 50,
88                                 minPods:                     1,
89                                 maxPods:                     2,
90                                 firstScale:                  1,
91                         }
92                         scaleTest.run("rc-light", common.KindRC, rc, f)
93                 })
94         })
...
117 func (scaleTest *HPAScaleTest) run(name string, kind schema.GroupVersionKind, rc *common.ResourceConsumer, f *framework.Framework) {
118         const timeToWait = 15 * time.Minute
119         rc = common.NewDynamicResourceConsumer(name, f.Namespace.Name, kind, int(scaleTest.initPods), int(scaleTest.totalInitialCPUUsage), 0, 0, scaleTest.perPodCPURequest, 200, f.ClientSet, f.InternalClientset, f.ScalesGetter)
120         defer rc.CleanUp()
121         hpa := common.CreateCPUHorizontalPodAutoscaler(rc, scaleTest.targetCPUUtilizationPercent, scaleTest.minPods, scaleTest.maxPods)
122         defer common.DeleteHorizontalPodAutoscaler(rc, hpa.Name)
123         rc.WaitForReplicas(int(scaleTest.firstScale), timeToWait)
124         if scaleTest.firstScaleStasis > 0 {
125                 rc.EnsureDesiredReplicas(int(scaleTest.firstScale), scaleTest.firstScaleStasis)
126         }
127         if scaleTest.cpuBurst > 0 && scaleTest.secondScale > 0 {
128                 rc.ConsumeCPU(scaleTest.cpuBurst)
129                 rc.WaitForReplicas(int(scaleTest.secondScale), timeToWait)
130         }
131 }

CPU使用率変更 ConsumeCPU()

172 // ConsumeCPU consumes given number of CPU
173 func (rc *ResourceConsumer) ConsumeCPU(millicores int) {
174         framework.Logf("RC %s: consume %v millicores in total", rc.name, millicores)
175         rc.cpu <- millicores
176 }

負荷 RC たぶん5番目と8番目の引数が CPU 負荷に関係

func NewDynamicResourceConsumer(
1   name,
2   nsName string,
3   kind schema.GroupVersionKind,
4   replicas,
5   initCPUTotal,                          scaleTest.totalInitialCPUUsage (50)
6   initMemoryTotal,
7   initCustomMetric int,
8   cpuLimit,                              scaleTest.perPodCPURequest (200)
9   memLimit int64,
10  clientset clientset.Interface, internalClientset *internalclientset.Clientset, scaleClient scaleclient.ScalesGetter) *ResourceConsumer {
    return newResourceConsumer(name, nsName, kind, replicas, initCPUTotal, initMemoryTotal, initCustomMetric, dynamicConsumptionTimeInSeconds,
            dynamicRequestSizeInMillicores, dynamicRequestSizeInMegabytes, dynamicRequestSizeCustomMetric, cpuLimit, memLimit, clientset, internalClientset, scaleClient, nil, nil)
}

本体 initCPUTotal が ConsumeCPU() に渡される。つまり初期の CPU 使用率は initCPUTotal であり、それはテストケースの totalInitialCPUUsage

132 func newResourceConsumer(name, nsName string, kind schema.GroupVersionKind, replicas, initCPUTotal, initMemoryTotal, initCustomMetric, consumptionTimeInS    econds, requestSizeInMillicores,
133         requestSizeInMegabytes int, requestSizeCustomMetric int, cpuLimit, memLimit int64, clientset clientset.Interface, internalClientset *internalclie    ntset.Clientset, scaleClient scaleclient.ScalesGetter, podAnnotations, serviceAnnotations map[string]string) *ResourceConsumer {
134         if podAnnotations == nil {
135                 podAnnotations = make(map[string]string)
136         }
137         if serviceAnnotations == nil {
138                 serviceAnnotations = make(map[string]string)
139         }
140         runServiceAndWorkloadForResourceConsumer(clientset, internalClientset, nsName, name, kind, replicas, cpuLimit, memLimit, podAnnotations, serviceA    nnotations)
141         rc := &ResourceConsumer{
142                 name:                     name,
143                 controllerName:           name + "-ctrl",
144                 kind:                     kind,
145                 nsName:                   nsName,
146                 clientSet:                clientset,
147                 internalClientset:        internalClientset,
148                 scaleClient:              scaleClient,
149                 cpu:                      make(chan int),
150                 mem:                      make(chan int),
151                 customMetric:             make(chan int),
152                 stopCPU:                  make(chan int),
153                 stopMem:                  make(chan int),
154                 stopCustomMetric:         make(chan int),
155                 consumptionTimeInSeconds: consumptionTimeInSeconds,
156                 sleepTime:                time.Duration(consumptionTimeInSeconds) * time.Second,
157                 requestSizeInMillicores:  requestSizeInMillicores,
158                 requestSizeInMegabytes:   requestSizeInMegabytes,
159                 requestSizeCustomMetric:  requestSizeCustomMetric,
160         }
161
162         go rc.makeConsumeCPURequests()
163         rc.ConsumeCPU(initCPUTotal)
164
165         go rc.makeConsumeMemRequests()
166         rc.ConsumeMem(initMemoryTotal)
167         go rc.makeConsumeCustomMetric()
168         rc.ConsumeCustomMetric(initCustomMetric)
169         return rc
170 }

CPU 使用率閾値と同じ CPU 負荷の場合、HPA の振る舞いとしてスケールダウンするべきか確認する

HPA のアルゴリズム: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md#autoscaling-algorithm

The target number of pods is calculated from the following formula:

  TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)

上記の式だと、「CPU 使用率閾値と同じ CPU 負荷の場合で負荷ツールの動作にブレがある場合（負荷ツールにブレがあるのは普通）」スケールダウンしないじゃん！ k/k pull/66950 として投稿済み

全実行結果 https://github.com/oomichi/try-kubernetes/issues/36 だと、このスケールダウンテストは成功。負荷ツールの微妙なコンディションに依存するので、成功することもありえる。

最新の HPA テストの失敗項目は以下のとおり

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 5 pods to 3 pods and from 3 to 1
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability

一番シンプルな2段階スケールアウトテストを試す

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5

Aug  3 21:30:58.853: INFO: waiting for 3 replicas (current: 1)
Aug  3 21:31:18.859: INFO: waiting for 3 replicas (current: 1)
Aug  3 21:31:18.876: INFO: RC test-deployment: sending request to consume 0 MB
Aug  3 21:31:18.876: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  3 21:31:19.095: INFO: RC test-deployment: sending request to consume 250 millicores
Aug  3 21:31:19.095: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  3 21:31:19.095: INFO: RC test-deployment: sending request to consume 0 of custom metric QPS
Aug  3 21:31:19.096: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug  3 21:31:38.855: INFO: waiting for 3 replicas (current: 4)
...
Aug  3 21:35:49.043: INFO: RC test-deployment: sending request to consume 0 MB
Aug  3 21:35:49.043: INFO: ConsumeMem URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeMem  false durationSec=30&megabytes=0&requestSizeMegabytes=100 }
Aug  3 21:35:49.380: INFO: RC test-deployment: sending request to consume 0 of custom metric QPS
Aug  3 21:35:49.380: INFO: ConsumeCustomMetric URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/BumpMetric  false delta=0&durationSec=30&metric=QPS&requestSizeMetrics=10 }
Aug  3 21:35:50.577: INFO: RC test-deployment: sending request to consume 250 millicores
Aug  3 21:35:50.577: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-t5rkx/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  3 21:35:58.861: INFO: waiting for 3 replicas (current: 5)
Aug  3 21:36:18.848: INFO: waiting for 3 replicas (current: 5)

別コンソール

$ kubectl get hpa --all-namespaces
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-t5rkx   test-deployment   Deployment/test-deployment   95%/20%   1         5         4          2m

Targetが20%に対して、使用率が95% であれば Pod が3つに収まらないのは正常。 CPU負荷をかけるテストコードを確認する。

e2e/autoscaling/horizontal_pod_autoscaling.go

 35         titleUp := "Should scale from 1 pod to 3 pods and from 3 to 5"
 36         titleDown := "Should scale from 5 pods to 3 pods and from 3 to 1"
 37
 38         SIGDescribe("[Serial] [Slow] Deployment", func() {
 39                 // CPU tests via deployments
 40                 It(titleUp, func() {
 41                         scaleUp("test-deployment", common.KindDeployment, false, rc, f)
 42                 })
...
133 func scaleUp(name string, kind schema.GroupVersionKind, checkStability bool, rc *common.ResourceConsumer, f *framework.Framework) {
134         stasis := 0 * time.Minute
135         if checkStability {
136                 stasis = 10 * time.Minute
137         }
138         scaleTest := &HPAScaleTest{
139                 initPods:                    1,
140                 totalInitialCPUUsage:        250,
141                 perPodCPURequest:            500,
142                 targetCPUUtilizationPercent: 20,
143                 minPods:                     1,
144                 maxPods:                     5,
145                 firstScale:                  3,
146                 firstScaleStasis:            stasis,
147                 cpuBurst:                    700,
148                 secondScale:                 5,
149         }
150         scaleTest.run(name, kind, rc, f)
151 }

CPU使用率の見え方を確認

ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:48:54 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   <unknown>/20%   1         5         1          1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:49:20 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   <unknown>/20%   1         5         1          1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:49:29 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   98%/20%   1         5         1          1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:49:34 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   98%/20%   1         5         1          1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:49:48 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   98%/20%   1         5         1          1m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:49:57 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   98%/20%   1         5         4          2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:50:13 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   98%/20%   1         5         4          2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:50:32 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   94%/20%   1         5         4          2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:50:45 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   94%/20%   1         5         4          2m
ubuntu@k8s-master:~$ date; kubectl get hpa --all-namespaces
Fri Aug  3 22:50:55 UTC 2018
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-qbghv   test-deployment   Deployment/test-deployment   94%/20%   1         5         4          3m

テストとして totalInitialCPUUsage: 250 を設定して、kubectl として 98% (Podが1つのとき) に見える関係性を調べる。

totalInitialCPUUsage: 250 は ConsumeCPU() の millicores として渡される。これはその名のとおり、millicores なので 250 は 0.25 cores(25%) となる。 Target は 20% なので、Pods は2個？しかし、テストでの期待値は3個。混乱・・・負荷コンテナの中身を調べる。 → resource-consumer というコンテナイメージがそれ test/utils/image/manifest.go

 25         e2eRegistry     = "gcr.io/kubernetes-e2e-test-images"
 ..
 91         ResourceConsumer    = ImageConfig{e2eRegistry, "resource-consumer", "1.3", true}

ソースは https://github.com/kubernetes/kubernetes/tree/master/test/images/resource-consumer このコンテナイメージを使って、CPU 負荷の出方を確認するしかなさそう・・以下のコマンドで直接、対象Dockerイメージを実行可能

$ sudo docker run --name resource-c gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3
Unable to find image 'gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3' locally
1.3: Pulling from kubernetes-e2e-test-images/resource-consumer
d2519f41f710: Pull complete
7bb286ada61f: Pull complete
0878ec3e77c3: Pull complete
137c312171bf: Pull complete
Digest: sha256:38f1007b38bd5ceab098ac24278066068e1cdd9e3eb2a540233b7d1053be2696
Status: Downloaded newer image for gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3

https://github.com/kubernetes/kubernetes/tree/master/test/images/resource-consumer によると kubectl での実行方法も書いてある。古いけど e2e テストのやり方に合わせて実行してみる。 ★--request パラメータを指定しないと Usage は unknown になり、HPA は動作しない

$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=500m,memory=256Mi'
$ kubectl get pods -o wide
NAME                                 READY     STATUS    RESTARTS   AGE       IP             NODE
resource-consumer-767f57866f-vh4v7   1/1       Running   0          10s       10.244.1.192   k8s-node01
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ kubectl get service
NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
resource-consumer   ClusterIP   10.109.92.74   <none>        8080/TCP   38s
$
$ curl --data "millicores=250&durationSec=600" http://10.109.92.74:8080/ConsumeCPU
ConsumeCPU
250 millicores
600 durationSec

iaas-cpu01 では指定どおり25%前後でconsume-cpuが動いている

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29852 root      20   0    9368   6356   1344 S  25.9  0.2   0:20.92 consume-cpu

kubectl でみると 50% 前後で見えているのは何故？

$ kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS     REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   49%/20%   1         5           1          4m

3つにスケールアウト、Usageが1/3 になっている → つまり、ここで見えている Usage は1つのPod 毎の平均 Usage

$ kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   16%/20%   1         5         3          11m

Cleanup

$ kubectl delete hpa resource-consumer; kubectl delete service resource-consumer; kubectl delete deployment resource-consumer

Run workload

$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=500m,memory=256Mi'
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ ADDRESS=`kubectl get service | grep resource-consumer | awk '{print $3}'`
$ curl --data "millicores=250&durationSec=600" http://${ADDRESS}:8080/ConsumeCPU

Continue checking hpa condition

$ kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   49%/20%   1         5         2          4m
$
$ kubectl get pods -o wide
NAME                                 READY     STATUS    RESTARTS   AGE       IP             NODE
resource-consumer-767f57866f-8mb28   1/1       Running   0          1m        10.244.0.183   k8s-master
resource-consumer-767f57866f-w6drh   1/1       Running   0          5m        10.244.1.197   k8s-node01
$ top
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5594 root      20   0    9368   6560   1280 R  25.2  0.2   0:47.85 consume-cpu

実際のプロセスは 25% なのに、なぜ kubectl get hpa では 50% 前後で見える？ 25% は millicores で 250 (0.25 cores) を指定しているので正しそうだが。 → Deployment 作成の際に --requests='cpu=500m' を指定しているから 250 millicores 指定の際に 50% になっている。

500 millicores を指定した場合は、50% になるので負荷ツール自体はうまく動いているように見える

$ curl --data "millicores=500&durationSec=600" http://${ADDRESS}:8080/ConsumeCPU
...
$ top
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 8085 root      20   0    9368   6612   1344 S  50.3  0.2   0:10.11 consume-cpu

このとき、hpa としては2倍の 100% と見える

$ kubectl get hpa
NAME                REFERENCE                      TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   100%/20%   1         5         1          2m

hpa が表示する CPU 利用率のロジックを確認する。

/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers API の rows.cells 配列の3番目がソレ

$ kubectl get hpa -v=16
...
I0801 22:25:54.290584     379 request.go:897] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1beta1",
 "metadata":{"selfLink":"/apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers","resourceVersion":"1542106"},
 "columnDefinitions":[
 ...
 "rows":[
      {"cells":[
           "nginx",
           "Deployment/nginx",
           "\u003cunknown\u003e/50%",
           "1",
           3,
           1,
           "46m"],

pkg/registry/autoscaling/rest/storage_autoscaling.go

 28         horizontalpodautoscalerstore "k8s.io/kubernetes/pkg/registry/autoscaling/horizontalpodautoscaler/storage"
...
 33 func (p RESTStorageProvider) NewRESTStorage(apiResourceConfigSource serverstorage.APIResourceConfigSource, restOptionsGetter generic.RESTOptionsGetter) (    genericapiserver.APIGroupInfo, bool) {
 34         apiGroupInfo := genericapiserver.NewDefaultAPIGroupInfo(autoscaling.GroupName, legacyscheme.Scheme, legacyscheme.ParameterCodec, legacyscheme.Cod    ecs)
 35         // If you add a version here, be sure to add an entry in `k8s.io/kubernetes/cmd/kube-apiserver/app/aggregator.go with specific priorities.
 36         // TODO refactor the plumbing to provide the information in the APIGroupInfo
 37
 38         if apiResourceConfigSource.VersionEnabled(autoscalingapiv2beta1.SchemeGroupVersion) {
 39                 apiGroupInfo.VersionedResourcesStorageMap[autoscalingapiv2beta1.SchemeGroupVersion.Version] = p.v2beta1Storage(apiResourceConfigSource, r    estOptionsGetter)
 40         }
 41         if apiResourceConfigSource.VersionEnabled(autoscalingapiv1.SchemeGroupVersion) {
 42                 apiGroupInfo.VersionedResourcesStorageMap[autoscalingapiv1.SchemeGroupVersion.Version] = p.v1Storage(apiResourceConfigSource, restOptions    Getter)
 43         }
 44
 45         return apiGroupInfo, true
 46 }
 47
 48 func (p RESTStorageProvider) v1Storage(apiResourceConfigSource serverstorage.APIResourceConfigSource, restOptionsGetter generic.RESTOptionsGetter) map[st    ring]rest.Storage {
 49         storage := map[string]rest.Storage{}
 50         // horizontalpodautoscalers
 51         hpaStorage, hpaStatusStorage := horizontalpodautoscalerstore.NewREST(restOptionsGetter)
 52         storage["horizontalpodautoscalers"] = hpaStorage
 53         storage["horizontalpodautoscalers/status"] = hpaStatusStorage
 54
 55         return storage
 56 }

NOTE: pkg/registry ディレクトリ配下の rest というディレクトリに各リソースの REST API 実装がある

~/go/src/k8s.io/kubernetes/pkg/registry$ find . -name rest
./apps/rest
./settings/rest
./scheduling/rest
./rbac/rest
./batch/rest
./core/rest
./core/pod/rest
./core/node/rest
./extensions/rest
./policy/rest
./authentication/rest
./certificates/rest
./admissionregistration/rest
./networking/rest
./authorization/rest
./events/rest
./coordination/rest
./storage/rest
./autoscaling/rest

そこから辿って hpa の場合は詳細実装が pkg/registry/autoscaling/horizontalpodautoscaler/storage/storage.go にある

 38 // NewREST returns a RESTStorage object that will work against horizontal pod autoscalers.
 39 func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST) {
 40         store := &genericregistry.Store{
 41                 NewFunc:                  func() runtime.Object { return &autoscaling.HorizontalPodAutoscaler{} },
 42                 NewListFunc:              func() runtime.Object { return &autoscaling.HorizontalPodAutoscalerList{} },
 43                 DefaultQualifiedResource: autoscaling.Resource("horizontalpodautoscalers"),
 44
 45                 CreateStrategy: horizontalpodautoscaler.Strategy,
 46                 UpdateStrategy: horizontalpodautoscaler.Strategy,
 47                 DeleteStrategy: horizontalpodautoscaler.Strategy,
 48
 49                 TableConvertor: printerstorage.TableConvertor{TablePrinter: printers.NewTablePrinter().With(printersinternal.AddHandlers)},
 50         }
 51         options := &generic.StoreOptions{RESTOptions: optsGetter}
 52         if err := store.CompleteWithOptions(options); err != nil {
 53                 panic(err) // TODO: Propagate error up
 54         }
 55
 56         statusStore := *store
 57         statusStore.UpdateStrategy = horizontalpodautoscaler.StatusStrategy
 58         return &REST{store}, &StatusREST{store: &statusStore}
 59 }

今回知りたいのは GET list of HPAs なので、たぶん autoscaling.HorizontalPodAutoscalerList k8s.io/kubernetes/pkg/apis/autoscaling

pkg/apis/autoscaling/types.go

413 // HorizontalPodAutoscalerList is a list of horizontal pod autoscaler objects.
414 type HorizontalPodAutoscalerList struct {
415         metav1.TypeMeta
416         // Metadata is the standard list metadata.
417         // +optional
418         metav1.ListMeta
419
420         // Items is the list of horizontal pod autoscaler objects.
421         Items []HorizontalPodAutoscaler
422 }
...
394 type HorizontalPodAutoscaler struct {
395         metav1.TypeMeta
396         // Metadata is the standard object metadata.
397         // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
398         // +optional
399         metav1.ObjectMeta
400
401         // Spec is the specification for the behaviour of the autoscaler.
402         // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
403         // +optional
404         Spec HorizontalPodAutoscalerSpec
405
406         // Status is the current information about the autoscaler.
407         // +optional
408         Status HorizontalPodAutoscalerStatus    これが CPU 使用率を示していそう
409 }
...
221 type HorizontalPodAutoscalerStatus struct {
222         // ObservedGeneration is the most recent generation observed by this autoscaler.
223         // +optional
224         ObservedGeneration *int64
225
226         // LastScaleTime is the last time the HorizontalPodAutoscaler scaled the number of pods,
227         // used by the autoscaler to control how often the number of pods is changed.
228         // +optional
229         LastScaleTime *metav1.Time
230
231         // CurrentReplicas is current number of replicas of pods managed by this autoscaler,
232         // as last seen by the autoscaler.
233         CurrentReplicas int32
234
235         // DesiredReplicas is the desired number of replicas of pods managed by this autoscaler,
236         // as last calculated by the autoscaler.
237         DesiredReplicas int32
238
239         // CurrentMetrics is the last read state of the metrics used by this autoscaler.
240         CurrentMetrics []MetricStatus  これっぽい
241
242         // Conditions is the set of conditions required for this autoscaler to scale its target,
243         // and indicates whether or not those conditions are met.
244         Conditions []HorizontalPodAutoscalerCondition
245 }
...
296 // MetricStatus describes the last-read state of a single metric.
297 type MetricStatus struct {
298         // Type is the type of metric source.  It will be one of "Object",
299         // "Pods" or "Resource", each corresponds to a matching field in the object.
300         Type MetricSourceType
301
302         // Object refers to a metric describing a single kubernetes object
303         // (for example, hits-per-second on an Ingress object).
304         // +optional
305         Object *ObjectMetricStatus
306         // Pods refers to a metric describing each pod in the current scale target
307         // (for example, transactions-processed-per-second).  The values will be
308         // averaged together before being compared to the target value.
309         // +optional
310         Pods *PodsMetricStatus
311         // Resource refers to a resource metric (such as those specified in
312         // requests and limits) known to Kubernetes describing each pod in the
313         // current scale target (e.g. CPU or memory). Such metrics are built in to
314         // Kubernetes, and have special scaling options on top of those available
315         // to normal per-pod metrics using the "pods" source.
316         // +optional
317         Resource *ResourceMetricStatus   これっぽい
318         // External refers to a global metric that is not associated
319         // with any Kubernetes object. It allows autoscaling based on information
320         // coming from components running outside of cluster
321         // (for example length of queue in cloud messaging service, or
322         // QPS from loadbalancer running outside of cluster).
323         // +optional
324         External *ExternalMetricStatus
325 }

正解っぽい

349 // ResourceMetricStatus indicates the current value of a resource metric known to
350 // Kubernetes, as specified in requests and limits, describing each pod in the
351 // current scale target (e.g. CPU or memory).  Such metrics are built in to
352 // Kubernetes, and have special scaling options on top of those available to
353 // normal per-pod metrics using the "pods" source.
354 type ResourceMetricStatus struct {
355         // Name is the name of the resource in question.
356         Name api.ResourceName
357         // CurrentAverageUtilization is the current value of the average of the
358         // resource metric across all relevant pods, represented as a percentage of
359         // the requested value of the resource for the pods.  It will only be
360         // present if `targetAverageValue` was set in the corresponding metric
361         // specification.
362         // +optional
363         CurrentAverageUtilization *int32
364         // CurrentAverageValue is the current value of the average of the
365         // resource metric across all relevant pods, as a raw value (instead of as
366         // a percentage of the request), similar to the "pods" metric source type.
367         // It will always be set, regardless of the corresponding metric specification.
368         CurrentAverageValue resource.Quantity
369 }

deployment 作成の際に --requests='cpu=500m' を指定しているから 250 millicores 指定の際に 50% になっている？ --requests='cpu=1000m' を指定したときに 250 millicores の負荷をかけたときの HPA の状態を確認する。 → 想定は hpa 上、CPU 使用率が 25% として見えること（500mのときは50%だった）

$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=1000m,memory=256Mi'
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ ADDRESS=`kubectl get service | grep resource-consumer | awk '{print $3}'`
$ curl --data "millicores=250&durationSec=600" http://${ADDRESS}:8080/ConsumeCPU

hpa 状態を確認 → 想定どおり。更に pods が2つにスケールし、12%になることを想定。

$ kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   24%/20%   1         5         1          2m

一定時間後 hpa 状態を確認 → 想定どおり

$ kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   12%/20%   1         5         2          4m

やっぱり、k/kubernetes/pull/66982 は必須じゃん！

上記の知識を得た状態で、問題となっている e2e テストのテスト条件を再度確認する → scaleUp については妥当に見える test/e2e/autoscaling/horizontal_pod_autoscaling.go

133 func scaleUp(name string, kind schema.GroupVersionKind, checkStability bool, rc *common.ResourceConsumer, f *framework.Framework) {
134         stasis := 0 * time.Minute
135         if checkStability {
136                 stasis = 10 * time.Minute
137         }
138         scaleTest := &HPAScaleTest{
139                 initPods:                    1,    // 初期のPod数
140                 totalInitialCPUUsage:        250,  // 初期スケールアウト時に負荷Podが行う CPU負荷（25%）
141                 perPodCPURequest:            500,  // 負荷 Pod の要求 CPU使用率 (50%)、つまり HPA上はCPU負荷が50%(25% / 50%)
142                 targetCPUUtilizationPercent: 20,  // 目標とする 1 Podあたりの CPU負荷 (20%)、平均がその値になるように Pod 数を増減させる
143                 minPods:                     1,
144                 maxPods:                     5,
145                 firstScale:                  3,  // 最初のスケールアウト時の想定 Pod 数。目標CPU負荷20%に対してHPA上のCPU負荷が50% になるため、3つのPodsを作り平均17%以内に収める。妥当
146                 firstScaleStasis:            stasis,
147                 cpuBurst:                    700,  // 2回目のスケールアウト時にかける CPU 負荷。perPodCPURequest が 500 のため、HPA 上は CPU 負荷が 140% となる
148                 secondScale:                 5,  // 2回目のスケールアウト時の想定 Pod 数。targetCPUUtilizationPercent が20のため、本来は 7 個の Pods が必要だが（=140/20）が maxPods が5のため、この数になる
149         }
150         scaleTest.run(name, kind, rc, f)
151 }

問題のテスト実行時の状況を確認する一番シンプルな2段階スケールアウトテストを試す

[Fail] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5

Pod数3想定に対して4つになっているのが問題

$ kubectl get hpa --all-namespaces
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-t5rkx   test-deployment   Deployment/test-deployment   95%/20%   1         5         4          2m

負荷 Pod の要求 CPU使用率 (50%, perPodCPURequest: 500)、つまり HPA上はCPU負荷が50%(25% / 50%) になるはずが 95% になっているのが問題

プロセス単位でContainer負荷が50%になっていることを確認する（2重実行されている可能性もあり？シリアル／パラレルテスト実行もチェック）

$ kubectl get hpa --all-namespaces
NAMESPACE                                    NAME              REFERENCE                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
e2e-tests-horizontal-pod-autoscaling-fhrzw   test-deployment   Deployment/test-deployment   99%/20%   1         5         1          1m

負荷プロセスがいっぱい

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1065 root      20   0  948596 139176  54948 S  14.2  3.4 676:29.88 kubelet
25666 root      20   0    9368   6044   1280 S   4.0  0.1   0:00.91 consume-cpu
25679 root      20   0    9368   5788   1284 S   4.0  0.1   0:00.89 consume-cpu
25775 root      20   0    9368   6248   1220 R   4.0  0.2   0:00.93 consume-cpu
25665 root      20   0    9432   6296   1280 S   3.6  0.2   0:00.91 consume-cpu
25667 root      20   0    9368   6352   1344 S   3.6  0.2   0:00.89 consume-cpu
25685 root      20   0    9368   5780   1280 R   3.6  0.1   0:00.90 consume-cpu
25694 root      20   0    9368   6048   1284 S   3.6  0.1   0:00.90 consume-cpu
25698 root      20   0    9368   6048   1280 S   3.6  0.1   0:00.89 consume-cpu
25702 root      20   0    9368   6044   1268 S   3.6  0.1   0:00.90 consume-cpu
25703 root      20   0    9368   6348   1344 S   3.6  0.2   0:00.89 consume-cpu
25741 root      20   0    9368   6092   1344 S   3.6  0.2   0:00.90 consume-cpu
25763 root      20   0    9368   6036   1280 S   3.6  0.1   0:00.91 consume-cpu
25675 root      20   0    9368   6368   1344 S   3.3  0.2   0:00.87 consume-cpu

3.6%前後の負荷ツールプロセスが13、50%前後の CPU 負荷がかかっている（本来25%を想定）手動で試したときと負荷のかかり方が異なる（負荷ツールは1プロセスだったし、合計も異なる）

テストログから CPU 負荷に関する項目を抽出

Aug  6 23:34:50.709: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:35:20.901: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:35:50.987: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:36:21.086: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:36:51.189: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:37:21.367: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:37:51.484: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }
Aug  6 23:38:21.617: INFO: ConsumeCPU URL: {https   192.168.1.108:6443 /api/v1/namespaces/e2e-tests-horizontal-pod-autoscaling-fhrzw/services/test-deployment-ctrl/proxy/ConsumeCPU  false durationSec=30&millicores=250&requestSizeMillicores=20 }

上記 e2e テストログの負荷を試してみる

$ kubectl run resource-consumer --image=gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 --port 8080 --requests='cpu=500m,memory=256Mi'
$ kubectl autoscale deployment resource-consumer --min=1 --max=5 --cpu-percent=20
$ kubectl expose deployment resource-consumer
$ ADDRESS=`kubectl get service | grep resource-consumer | awk '{print $3}'`
$ curl --data "durationSec=30&millicores=250&requestSizeMillicores=20" http://${ADDRESS}:8080/ConsumeCPU

複数プロセスにはならない

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29246 root      20   0    9304   5212   1220 S  24.8  0.1   0:01.90 consume-cpu

テストの想定どおり3になる

$ kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
resource-consumer   Deployment/resource-consumer   16%/20%   1         5         3          9m

負荷ツールのコードを見ると、複数プロセスにならないのは変 test/images/resource-consumer/controller/controller.go 250(millicores) / 20(requestSizeMillicores) = 12.5 で13になる

 99         count := millicores / requestSizeInMillicores
100         rest := millicores - count*requestSizeInMillicores
101         fmt.Fprintf(w, "RC manager: sending %v requests to consume %v millicores each and 1 request to consume %v millicores\n", count, requestSizeInMill    icores, rest)
102         if count > 0 {
103                 handler.waitGroup.Add(count)
104                 handler.sendConsumeCPURequests(w, count, requestSizeInMillicores, durationSec)
105         }
106         if rest > 0 {
107                 handler.waitGroup.Add(1)
108                 go handler.sendOneConsumeCPURequest(w, rest, durationSec)
109         }
110         handler.waitGroup.Wait()
111 }

Imageが gcr.io/kubernetes-e2e-test-images/resource-consumer:1.3 では無い? 最新のコードだとrequestSizeMillicoresを指定しないとエラーになるはずが、手動実行だとエラーにならないし。

e2e のテストログ上、どのイメージを使っているかは出力されない。

$ kubectl describe deployment <deployment名>

で確認する。 rs テストのときは

    Image:      gcr.io/kubernetes-e2e-test-images/resource-consumer/controller-amd64:1.0

deployment のときは

   Image:      gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3

詳細

$ kubectl describe deployments test-deployment -n=e2e-tests-horizontal-pod-autoscaling-cndn6
Name:                   test-deployment
Namespace:              e2e-tests-horizontal-pod-autoscaling-cndn6
CreationTimestamp:      Tue, 07 Aug 2018 00:35:11 +0000
Labels:                 name=test-deployment
Annotations:            deployment.kubernetes.io/revision=1
Selector:               name=test-deployment
Replicas:               4 desired | 4 updated | 4 total | 4 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
Pod Template:
  Labels:  name=test-deployment
  Containers:
   test-deployment:
    Image:      gcr.io/kubernetes-e2e-test-images/resource-consumer-amd64:1.3
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     500m
      memory:  209715200
    Requests:
      cpu:        500m
      memory:     209715200
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   test-deployment-5fd8dc46f8 (4/4 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  2m    deployment-controller  Scaled up replica set test-deployment-5fd8dc46f8 to 1
  Normal  ScalingReplicaSet  34s   deployment-controller  Scaled up replica set test-deployment-5fd8dc46f8 to 4

$ kubectl describe hpa test-deployment -n=e2e-tests-horizonta
l-pod-autoscaling-cndn6
Name:                                                  test-deployment
Namespace:                                             e2e-tests-horizontal-pod-autoscaling-cndn6
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Tue, 07 Aug 2018 00:35:36 +0000
Reference:                                             Deployment/test-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  66% (334m) / 20%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       4 current / 4 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     False   BackoffBoth         the time since the previous scale is still within both the downscale and upscale forbidden windows
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age              From                       Message
  ----     ------                        ----             ----                       -------
  Warning  FailedGetResourceMetric       2m (x2 over 2m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  2m (x2 over 2m)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Normal   SuccessfulRescale             1m               horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target

あぁ、なんかわかってきた。 resource-consumerのCPU消費に関するHTTP query の処理が二箇所ある。

controller/controller.go:               handler.handleConsumeCPU(w, req.Form)
controller/controller.go:func (handler *Controller) handleConsumeCPU(w http.ResponseWriter, query url.Values) {
resource_consumer_handler.go:           handler.handleConsumeCPU(w, req.Form)
resource_consumer_handler.go:func (handler *ResourceConsumerHandler) handleConsumeCPU(w http.ResponseWriter, query url.Values) {

たぶん、e2eテストは controller/controller.go 経由でリクエストしており、それが RequestSizeInMillicoresQuery をサポートしている。そして、それが複数回 resource_consumer_handler.go をコールしていたと考えられる。手動テストでは resource_consumer_handler.go を直接コールしていたので、RequestSizeInMillicoresQuery をサポートしていなかったのでは？

下記からすると、本来1プロセスあたり 2% しか負荷がかかるべきではないところが、4% かかっている?

durationSec=30&millicores=250&requestSizeMillicores=20
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1065 root      20   0  948596 139176  54948 S  14.2  3.4 676:29.88 kubelet
25666 root      20   0    9368   6044   1280 S   4.0  0.1   0:00.91 consume-cpu
25679 root      20   0    9368   5788   1284 S   4.0  0.1   0:00.89 consume-cpu
25775 root      20   0    9368   6248   1220 R   4.0  0.2   0:00.93 consume-cpu
25665 root      20   0    9432   6296   1280 S   3.6  0.2   0:00.91 consume-cpu
25667 root      20   0    9368   6352   1344 S   3.6  0.2   0:00.89 consume-cpu
25685 root      20   0    9368   5780   1280 R   3.6  0.1   0:00.90 consume-cpu
25694 root      20   0    9368   6048   1284 S   3.6  0.1   0:00.90 consume-cpu
25698 root      20   0    9368   6048   1280 S   3.6  0.1   0:00.89 consume-cpu
25702 root      20   0    9368   6044   1268 S   3.6  0.1   0:00.90 consume-cpu
25703 root      20   0    9368   6348   1344 S   3.6  0.2   0:00.89 consume-cpu
25741 root      20   0    9368   6092   1344 S   3.6  0.2   0:00.90 consume-cpu
25763 root      20   0    9368   6036   1280 S   3.6  0.1   0:00.91 consume-cpu
25675 root      20   0    9368   6368   1344 S   3.3  0.2   0:00.87 consume-cpu

Bingっぽい。2%で3つ負荷をかけたところ、6%以上の負荷がかかっている。

$ curl --data "durationSec=300&millicores=20" http://${ADDRESS}:8080/ConsumeCPU
$ curl --data "durationSec=300&millicores=20" http://${ADDRESS}:8080/ConsumeCPU
$ curl --data "durationSec=300&millicores=20" http://${ADDRESS}:8080/ConsumeCPU
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15096 root      20   0    9368   6252   1220 S   6.9  0.2   0:06.50 consume-cpu
15156 root      20   0    9368   6252   1220 S   6.6  0.2   0:05.46 consume-cpu
15067 root      20   0    9368   6288   1280 S   6.3  0.2   0:07.69 consume-cpu

25% でかけると、想定どおりの負荷がかかる。

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16724 root      20   0    8180   4492   1288 R  25.0  0.1   0:01.35 consume-cpu

負荷が想定どおりかからない閾値を見つける。 10%は想定どおり

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15846 root      20   0    9368   6036   1280 R  10.0  0.1   0:01.92 consume-cpu

5%は駄目

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16200 root      20   0    9304   6368   1344 S   7.0  0.2   0:00.79 consume-cpu

問題のコードは test/e2e/common/autoscaling_utils.go

 46         dynamicRequestSizeInMillicores  = 20

k/kubernetes/issues/67051 として Issue 登録済み k/kubernetes/pull/67053 として PR 投稿済み

--check-version-skew=false オプションを指定して実行したところ、Pod の3から5のオートスケールで失敗。原因は CPU リソース不足 → テスト中断時に残った deployment, rc, rs が原因、Cleanup 実施

$ kubectl get pods -n=e2e-tests-horizontal-pod-autoscaling-fcq2t
NAME                               READY     STATUS    RESTARTS   AGE
test-deployment-5fd8dc46f8-9j59v   1/1       Running   0          9m
test-deployment-5fd8dc46f8-d5c42   0/1       Pending   0          5m
test-deployment-5fd8dc46f8-lxfbp   0/1       Pending   0          5m
test-deployment-5fd8dc46f8-nwltt   1/1       Running   0          10m
test-deployment-5fd8dc46f8-wz68c   1/1       Running   0          9m
$
$ kubectl describe pod test-deployment-5fd8dc46f8-d5c42 -n=e2e-tests-horizontal-pod-autoscaling-fcq2t
..
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  24s (x63 over 5m)  default-scheduler  0/2 nodes are available: 2 Insufficient cpu.

k/kubernetes/pull/67053 を適用したところ問題だった2つの HPA テストが通るようになった。

$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=from\s3\sto\s5$" --check-version-skew=false
..
Ran 2 of 999 Specs in 752.086 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 997 Skipped PASS

Ginkgo ran 1 suite in 12m32.321622799s
Test Suite Passed
2018/08/07 17:54:46 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=from\s3\sto\s5$' finished in 12m32.529859891s
2018/08/07 17:54:46 e2e.go:83: Done

HPA テストを全実行してみる → 通るようになった！

$ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=\[HPA\]" --check-version-skew=false
...
S [SKIPPING] in Spec Setup (BeforeEach) [0.001 seconds]
[sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: Custom Metrics from Stackdriver)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/framework.go:22
  should scale down with External Metric with target value from Stackdriver [Feature:CustomMetricsAutoscaling] [BeforeEach]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/autoscaling/custom_metrics_stackdriver_autoscaling.go:85

  Aug  7 19:12:17.376: Only supported for providers [gce gke] (not skeleton)

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/util.go:305
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSAug  7 19:12:17.378: INFO: Running AfterSuite actions on all node
Aug  7 19:12:17.378: INFO: Running AfterSuite actions on node 1

Ran 8 of 999 Specs in 4484.970 seconds
SUCCESS! -- 8 Passed | 0 Failed | 0 Pending | 991 Skipped PASS

Ginkgo ran 1 suite in 1h14m45.203539796s
Test Suite Passed
2018/08/07 19:12:17 process.go:155: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[HPA\]' finished in 1h14m45.413142012s
2018/08/07 19:12:17 e2e.go:83: Done

k/kubernetes/pull/67053 がマージされたら close する

oomichi / try-kubernetes

Test Failure: [HPA] Horizontal pod autoscaling (scale resource: CPU) #18