Closed HumairAK closed 3 years ago
$ grep -rL "limitranges" cluster-scope/base/namespaces/*/kustomization.yaml | awk -F/ '{print $4}'
acm
apicurio-apicurio-registry
argocd
as-pushgateway
b4mad-minecraft
cnv-testing
ds-black-flake
ds-example-project
ds-ml-workflows-ws
fde-audio-decoder-demo
hostpath-provisioner
kubeflow
lab-cicd-1-jump-app-cicd
lab-cicd-1-jump-app-dev
lab-cicd-2-jump-app-cicd
lab-cicd-2-jump-app-dev
lab-cicd-3-jump-app-cicd
lab-cicd-3-jump-app-dev
lab-cicd-4-jump-app-cicd
lab-cicd-4-jump-app-dev
local-storage
m4d-blueprints
m4d-system
mesh-for-data
metallb-system
observatorium-operator
open-aiops
opendatahub-operator
openshift-cnv
openshift-logging
openshift-metering
openshift-monitoring
openshift-operators
openshift-operators-redhat
openshift-serverless
openshift-storage
opf-ci-pipelines
opf-ci-prow
opf-jupyterhub
opf-jupyterhub-stage
opf-kafka
opf-monitoring
opf-observatorium
pulp-operator
ray-odh-integration
sa-dach-anwendertreffen
sa-dach-openshift-examples
sdap
tekton-pipelines
tufts-dcc-6
uky-hpc-workload-generator
ws-ml-prague
opf-ci-pipelines
and opf-ci-prow
are the most pressing ones - those cause troubles all the time.openshift-*
namespaces and other operator-related ones like local-storage
etc.opf-ci-pipelines and opf-ci-prow are the most pressing ones - those cause troubles all the time, I'll make a PR for those
@harshad16 any suggestions on what tiers we should set for these?
opf-ci-pipelines
andopf-ci-prow
are the most pressing ones - those cause troubles all the time.
@tumido , what kind of trouble is being faced, can you please elaborate. As a maintainer of these I would work on them, as you stated these troubles are faced all the time.
any suggestions on what tiers we should set for these?
@HumairAK , these need custom tiers i will open the pr. Note: These apps are serving a large number of people, and organization As these are CIs, it would require a large limit, it would release the resource within 3-6hr thank you for tagging me.
@tumido , what kind of trouble is being faced, can you please elaborate. As a maintainer of these I would work on them, as you stated these troubles are faced all the time.
Hey @harshad16, sorry I may have exaggerated that a bit. It's mainly the CPU reservations on these as we discussed here https://github.com/operate-first/support/issues/159 and https://github.com/operate-first/support/issues/196 . It got mostly solved by the prow pruning (for opf-ci-prow
) and your thoth quota.
Yesterday I had to do a manual cleanup of completed + failed pods on the opf-ci-pipelines
namespace to unstuck the cluster upgrade process since the completed pods were reserving a lot of CPU again...
Currently it's usually not a problem unless we have a degraded node or have to drain them for an upgrade...
Let's create a deadline for users to request quotas for their namespaces, otherwise let's add a reasonable quota for all of the ones lacking one.
I'll send an announcement out once this is in: https://github.com/operate-first/apps/pull/648
So that I can link it as part of the announcement, in case anyone wants to submit their own quota
We can close this now, Only the kubeflow
namespace is left, and there is a separate issue for that: https://github.com/operate-first/support/issues/222
As title suggests, we should hunt down/reach out to teams that have not yet implemented a resource quota and apply it.