Closed spiffxp closed 4 years ago
Opened https://github.com/kubernetes/k8s.io/pull/1172 to create an n1-highmem-8 nodepool
FYI @kubernetes/ci-signal
/wg k8s-infra /priority critical-urgent /sig testing /sig release /area prow
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Creation complete after 3m3s [id=projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool3-20200824185905793700000001]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
$ for n in $(k get nodes -l cloud.google.com/gke-nodepool=pool2-20200821222921432600000001 -o=name); do k cordon $n; done
node/gke-prow-build-pool2-2020082122292143-47105c2b-2lbb cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-3jx0 cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-fxx0 cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-hl2r cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-mvfj cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-nwqh cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-vhf6 cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-w4z9 cordoned
node/gke-prow-build-pool2-2020082122292143-47105c2b-xxb5 cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-0js0 cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-6dbh cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-6mns cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-8bzr cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-cd0h cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-dtsf cordoned
node/gke-prow-build-pool2-2020082122292143-93e7de61-hs4g cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-15dl cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-64bd cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-cntz cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-qhz7 cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-rmzr cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-srfj cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-t9k4 cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-xlkt cordoned
node/gke-prow-build-pool2-2020082122292143-f349874b-zrg0 cordoned
Disabled autoscaling for pool2
Realized pool3 was still configured to use n1-highmem-16, opened PR to modify https://github.com/kubernetes/k8s.io/pull/1174
$ terraform apply
# ...
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Creating...
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Modifying... [id=projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool2-20200821222921432600000001]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [10s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 10s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [20s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 20s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [40s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 40s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [50s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 50s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [1m0s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 1m0s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 1m10s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [1m10s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 1m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [1m20s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 1m30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [1m30s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 1m40s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [1m40s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [1m50s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 1m50s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [2m0s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 2m0s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [2m10s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 2m10s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 2m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [2m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [2m30s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 2m30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [2m40s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 2m40s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 2m50s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [2m50s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 3m0s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still creating... [3m0s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Creation complete after 3m9s [id=projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool3-20200824192452986800000001]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Destroying... [id=projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool3-20200824185905793700000001]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 3m10s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 10s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 3m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 20s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Still modifying... [id=projects/k8s-infra-prow-build/locations...Pools/pool2-20200821222921432600000001, 3m30s elapsed]
module.prow_build_nodepool_n1_highmem_16.google_container_node_pool.node_pool: Modifications complete after 3m32s [id=projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool2-20200821222921432600000001]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 40s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 50s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 1m0s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 1m10s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 1m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 1m30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 1m40s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 1m50s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 2m0s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 2m10s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 2m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 2m30s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 2m40s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 2m50s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 3m0s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 3m10s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Still destroying... [id=projects/k8s-infra-prow-build/locations...Pools/pool3-20200824185905793700000001, 3m20s elapsed]
module.prow_build_nodepool_n1_highmem_8.google_container_node_pool.node_pool: Destruction complete after 3m25s
Apply complete! Resources: 1 added, 1 changed, 1 destroyed.
So it turned autoscaling back on for pool2...
Which I have disabled once again
(I re-cordoned pool2 nodes just to be sure nothing new had gotten launched while autoscaling was turned on.)
Now we wait
I updated the prow-build dashboard with a pods-per-node and a pods-per-nodepool graph
$ date -u; k get pods -n test-pods --field-selector=status.phase=Running -o=json | jq -r '.items | map(select(.spec.nodeName | match("pool2")))[] | "\(.status.startTime) \(.metadata.labels["prow.k8s.io/job"]) \(.metadata.name)\(.spec.nodeName)"'
Mon 24 Aug 2020 07:37:40 PM UTC
2020-08-24T17:55:42Z ci-kubernetes-gce-conformance-latest-1-16 0237e48d-e633-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-qhz7
2020-08-24T19:00:42Z ci-kubernetes-e2e-gci-gce-alpha-features 1717848d-e63c-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-64bd
2020-08-24T19:00:42Z ci-kubernetes-e2e-gci-gce-ingress 1750da53-e63c-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-qhz7
2020-08-24T19:00:42Z ci-kubernetes-e2e-gci-gce-scalability 17ac6e98-e63c-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-47105c2b-xxb5
2020-08-24T18:53:42Z ci-kubernetes-e2e-gce-cos-k8sbeta-alphafeatures 1c8f13e8-e63b-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-qhz7
2020-08-24T19:01:12Z pull-kubernetes-integration 1f6992c8-e63c-11ea-802b-aeceeb3e3b15gke-prow-build-pool2-2020082122292143-93e7de61-0js0
2020-08-24T16:30:42Z ci-kubernetes-e2e-gce-cos-k8sstable1-serial 22933108-e627-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-93e7de61-hs4g
2020-08-24T16:52:41Z ci-kubernetes-e2e-gci-gce-serial 3512e68b-e62a-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-qhz7
2020-08-24T19:10:41Z ci-kubernetes-e2e-gce-cos-k8sbeta-reboot 7ca80a00-e63d-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-zrg0
2020-08-24T18:20:42Z ci-kubernetes-e2e-gce-cos-k8sbeta-serial 8079127b-e636-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-47105c2b-hl2r
2020-08-24T18:13:42Z ci-kubernetes-gce-conformance-latest-kubetest2 85ef75e9-e635-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-47105c2b-3jx0
2020-08-24T18:07:12Z pull-kubernetes-verify 92cec915-e634-11ea-9dde-eadcf4df1900gke-prow-build-pool2-2020082122292143-47105c2b-nwqh
2020-08-24T18:21:42Z ci-kubernetes-verify-stable3 a446fe75-e636-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-rmzr
2020-08-24T19:04:42Z ci-kubernetes-gce-conformance-latest-1-18 a6018377-e63c-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-93e7de61-hs4g
2020-08-24T18:43:42Z ci-kubernetes-gce-conformance-latest-1-17 b6dc3638-e639-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-f349874b-15dl
2020-08-22T01:05:38Z null boskos-564f5594dd-nq62tgke-prow-build-pool2-2020082122292143-93e7de61-hs4g
2020-08-22T01:04:12Z null boskos-janitor-58c6d75dc9-r7vgsgke-prow-build-pool2-2020082122292143-47105c2b-mvfj
2020-08-24T12:51:45Z null boskos-janitor-58c6d75dc9-zffrcgke-prow-build-pool2-2020082122292143-f349874b-rmzr
2020-08-22T01:05:17Z null boskos-reaper-56b467f9d8-b4gz2gke-prow-build-pool2-2020082122292143-93e7de61-hs4g
2020-08-24T18:23:42Z ci-kubernetes-gce-conformance-latest ebd9b141-e636-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-47105c2b-hl2r
2020-08-24T18:59:42Z ci-kubernetes-e2e-gce-cos-k8sbeta-slow f31b58e4-e63b-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-93e7de61-0js0
2020-08-24T18:45:42Z ci-kubernetes-e2e-gci-gce-ingress-canary fe9d0138-e639-11ea-99eb-2ebeded86955gke-prow-build-pool2-2020082122292143-93e7de61-0js0
Manually deleted some of the pool2 nodes that no longer had any pods on them to make room for more pool3 nodes
Forced boskos pods to move (note to self, I think next time a k delete -n test-pods -lapp=boskos
will be fine
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ date; k delete -n test-pods pod boskos-564f5594dd-nq62t; date
Mon 24 Aug 2020 08:19:28 PM UTC
pod "boskos-564f5594dd-nq62t" deleted
Mon 24 Aug 2020 08:19:36 PM UTC
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ date; k delete -n test-pods pod boskos-reaper-56b467f9d8-b4gz2; date
Mon 24 Aug 2020 08:19:48 PM UTC
pod "boskos-reaper-56b467f9d8-b4gz2" deleted
Mon 24 Aug 2020 08:19:56 PM UTC
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ date; k delete -n test-pods pod boskos-janitor-58c6d75dc9-r7vgs; date
Mon 24 Aug 2020 08:20:10 PM UTC
pod "boskos-janitor-58c6d75dc9-r7vgs" deleted
Mon 24 Aug 2020 08:25:21 PM UTC
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ date; k delete -n test-pods pod boskos-janitor-58c6d75dc9-zffrc; date
Mon 24 Aug 2020 08:58:25 PM UTC
pod "boskos-janitor-58c6d75dc9-zffrc" deleted
Mon 24 Aug 2020 09:03:28 PM UTC
Manually deleted some more empty nodes
Now waiting on two serial jobs
spiffxp@cloudshell:~ (k8s-infra-prow-build)$ date -u; k get pods -n test-pods --field-selector=status.phase=Running -o=json | jq -r '.items | map(select(.spec.nodeName | match("pool2")))[] | "\(.status.startTime) \(.metadata.labels["prow.k8s.io/job"]) \(.metadata.name) \(.spec.nodeName)"'
Mon 24 Aug 2020 09:56:55 PM UTC
2020-08-24T16:52:41Z ci-kubernetes-e2e-gci-gce-serial 3512e68b-e62a-11ea-99eb-2ebeded86955 gke-prow-build-pool2-2020082122292143-f349874b-qhz7
2020-08-24T18:20:42Z ci-kubernetes-e2e-gce-cos-k8sbeta-serial 8079127b-e636-11ea-99eb-2ebeded86955 gke-prow-build-pool2-2020082122292143-47105c2b-hl2r
Opened https://github.com/kubernetes/k8s.io/pull/1177 to remove the old n1-highmem-16 pool for good
/close Old nodepool was deleted
@spiffxp: Closing this issue.
(And we did confirm this made the integration and verify jobs much happier)
So if nothing else, throttled read ops correspond to unhappy integration and verify jobs
periodic-kubernetes-bazel-test-master was also unhappy when running on n1-highmem-16's and much happier when we switched back
here's integration-master for comparison
This tracks rolling back from https://github.com/kubernetes/k8s.io/issues/1168 which migrated to an n1-highmem-16 nodepool.
Certain jobs have become more flaky since/after 2020-08-21:
Per https://github.com/kubernetes/k8s.io/issues/1168#issuecomment-678548785 jobs scheduled after 2020-08-21 4pm PT would have been scheduled onto the new n1-highmem-16 nodepool. There is concern that using n1-highmem-16's may be causing more resource contention per node over resources that the schedule cannot account for, such as IOPs.
We don't have hard definitive proof for this, but feel like rolling back would provide us a useful datapoint