Apply resource quotas to user namespaces that don't yet have one

HumairAK commented 3 years ago

As title suggests, we should hunt down/reach out to teams that have not yet implemented a resource quota and apply it.

tumido commented 3 years ago

$ grep -rL "limitranges" cluster-scope/base/namespaces/*/kustomization.yaml | awk -F/ '{print $4}'

acm
apicurio-apicurio-registry
argocd
as-pushgateway
b4mad-minecraft
cnv-testing
ds-black-flake
ds-example-project
ds-ml-workflows-ws
fde-audio-decoder-demo
hostpath-provisioner
kubeflow
lab-cicd-1-jump-app-cicd
lab-cicd-1-jump-app-dev
lab-cicd-2-jump-app-cicd
lab-cicd-2-jump-app-dev
lab-cicd-3-jump-app-cicd
lab-cicd-3-jump-app-dev
lab-cicd-4-jump-app-cicd
lab-cicd-4-jump-app-dev
local-storage
m4d-blueprints
m4d-system
mesh-for-data
metallb-system
observatorium-operator
open-aiops
opendatahub-operator
openshift-cnv
openshift-logging
openshift-metering
openshift-monitoring
openshift-operators
openshift-operators-redhat
openshift-serverless
openshift-storage
opf-ci-pipelines
opf-ci-prow
opf-jupyterhub
opf-jupyterhub-stage
opf-kafka
opf-monitoring
opf-observatorium
pulp-operator
ray-odh-integration
sa-dach-anwendertreffen
sa-dach-openshift-examples
sdap
tekton-pipelines
tufts-dcc-6
uky-hpc-workload-generator
ws-ml-prague

opf-ci-pipelines and opf-ci-prow are the most pressing ones - those cause troubles all the time.
not sure if we need or should be setting it for openshift-* namespaces and other operator-related ones like local-storage etc.

HumairAK commented 3 years ago

opf-ci-pipelines and opf-ci-prow are the most pressing ones - those cause troubles all the time, I'll make a PR for those

@harshad16 any suggestions on what tiers we should set for these?

harshad16 commented 3 years ago

opf-ci-pipelines and opf-ci-prow are the most pressing ones - those cause troubles all the time.

@tumido , what kind of trouble is being faced, can you please elaborate. As a maintainer of these I would work on them, as you stated these troubles are faced all the time.

any suggestions on what tiers we should set for these?

@HumairAK , these need custom tiers i will open the pr. Note: These apps are serving a large number of people, and organization As these are CIs, it would require a large limit, it would release the resource within 3-6hr thank you for tagging me.

tumido commented 3 years ago

@tumido , what kind of trouble is being faced, can you please elaborate. As a maintainer of these I would work on them, as you stated these troubles are faced all the time.

Hey @harshad16, sorry I may have exaggerated that a bit. It's mainly the CPU reservations on these as we discussed here https://github.com/operate-first/support/issues/159 and https://github.com/operate-first/support/issues/196 . It got mostly solved by the prow pruning (for opf-ci-prow) and your thoth quota.

Yesterday I had to do a manual cleanup of completed + failed pods on the opf-ci-pipelines namespace to unstuck the cluster upgrade process since the completed pods were reserving a lot of CPU again...

Currently it's usually not a problem unless we have a degraded node or have to drain them for an upgrade...

HumairAK commented 3 years ago

Let's create a deadline for users to request quotas for their namespaces, otherwise let's add a reasonable quota for all of the ones lacking one.

HumairAK commented 3 years ago

I'll send an announcement out once this is in: https://github.com/operate-first/apps/pull/648

So that I can link it as part of the announcement, in case anyone wants to submit their own quota

HumairAK commented 3 years ago

We can close this now, Only the kubeflow namespace is left, and there is a separate issue for that: https://github.com/operate-first/support/issues/222

operate-first / apps

Apply resource quotas to user namespaces that don't yet have one #574