kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.82k stars 2.64k forks source link

Make critical jobs Guaranteed Pod QOS: ci-kubernetes-e2e-device-plugin-gpu #18579

Closed spiffxp closed 4 years ago

spiffxp commented 4 years ago

What should be cleaned up or changed:

This is part of https://github.com/kubernetes/test-infra/issues/18530

The following jobs should be Guaranteed Pod QOS, meaning they should have CPU and memory resource limits, and matching resource requests:

These jobs run on (google.com only) k8s-prow-build, so @spiffxp has provided the following guess:

General steps to follow:

/sig testing /sig release /area jobs /area release-eng

helenfeng737 commented 4 years ago

/assign

spiffxp commented 4 years ago

/remove-help since @ZhiFeng1993 has it

helenfeng737 commented 4 years ago

https://prow.k8s.io/?job=ci-kubernetes-e2e-gce-device-plugin-gpu https://prow.k8s.io/?job=ci-kubernetes-e2e-gce-device-plugin-gpu-beta https://prow.k8s.io/?job=ci-kubernetes-e2e-gce-device-plugin-gpu-stable1 https://prow.k8s.io/?job=ci-kubernetes-e2e-gce-device-plugin-gpu-stable2 https://prow.k8s.io/?job=ci-kubernetes-e2e-gce-device-plugin-gpu-stable3

https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-device-plugin-gpu https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-device-plugin-gpu-beta https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-device-plugin-gpu-stable1 https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-device-plugin-gpu-stable2 https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-device-plugin-gpu-stable3

spiffxp commented 4 years ago

Per https://github.com/kubernetes/test-infra/issues/18650 these now run in k8s-infra-prow-build so metrics are visible to k8s-infra-prow-viewers@kubernetes.io

helenfeng737 commented 4 years ago

Memory usage dashboard: https://console.cloud.google.com/monitoring/metrics-explorer?project=k8s-infra-prow-build&timeDomain=6h&pageState=%7B%22xyChart%22:%7B%22dataSets%22:%5B%7B%22timeSeriesFilter%22:%7B%22filter%22:%22metric.type%3D%5C%22kubernetes.io%2Fcontainer%2Fmemory%2Fused_bytes%5C%22%20resource.type%3D%5C%22k8s_container%5C%22%20metadata.user_labels.%5C%22prow.k8s.io%2Fjob%5C%22%3Dmonitoring.regex.full_match(%5C%22ci-kubernetes-e2e-gce-device-plugin-gpu.*%5C%22)%22,%22minAlignmentPeriod%22:%223600s%22,%22aggregations%22:%5B%7B%22perSeriesAligner%22:%22ALIGN_MEAN%22,%22crossSeriesReducer%22:%22REDUCE_SUM%22,%22groupByFields%22:%5B%22metadata.user_labels.%5C%22prow.k8s.io%2Fjob%5C%22%22%5D%7D,%7B%22crossSeriesReducer%22:%22REDUCE_NONE%22%7D%5D%7D,%22targetAxis%22:%22Y1%22,%22plotType%22:%22LINE%22%7D%5D,%22options%22:%7B%22mode%22:%22COLOR%22%7D,%22constantLines%22:%5B%5D,%22timeshiftDuration%22:%220s%22,%22y1Axis%22:%7B%22label%22:%22y1Axis%22,%22scale%22:%22LINEAR%22%7D%7D,%22isAutoRefresh%22:true,%22timeSelection%22:%7B%22timeRange%22:%221w%22%7D%7D

Looks like the memory usages are mostly lower than 2.5Gi. We are using 3Gi here, should be ok.

helenfeng737 commented 4 years ago

/close

I think it's safe to close this one. cc. @RobertKielty

k8s-ci-robot commented 4 years ago

@ZhiFeng1993: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/test-infra/issues/18579#issuecomment-672396952): >/close > >I think it's safe to close this one. cc. @RobertKielty Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
spiffxp commented 4 years ago

/close Thanks @ZhiFeng1993

k8s-ci-robot commented 4 years ago

@spiffxp: Closing this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/18579#issuecomment-673185832): >/close >Thanks @ZhiFeng1993 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.