Closed lukasboettcher closed 1 month ago
Name | Link |
---|---|
Latest commit | 324382873e0e1185fa00bed552f3b5327c94618e |
Latest deploy log | https://app.netlify.com/sites/capsule-documentation/deploys/6650d95142c5530008d7cd3f |
@lukasboettcher Thanks! Wondering, did you use resourcequotaindex to reduce cardinality instead eg namespace?
Since it is possible to create multiple resourcequotas for the same resource, I was facing a problem where the metrics were being overwritten. In the example given above, the third entry requests.memory: 6Gi
would overwrite the second entry requests.memory: 4Gi
in the metrics if we don't account for the index of the quota. Kubernetes itself always enforces the lowest quota, so we need to keep metrics for all tnt.spec.resourceQuotas.items.*
. I did not use namespace
as a label for the metrics, because they are tenant scoped and should be independent of the namespaces.
Metrics for the individual resourcequotas are already computed by i.e. kube-state-metrics
.
Thanks, just fyi we are also working on improving the observability of the tenant resource quota and some kind of mechanism to avoid the racing conditions. One measure is to expose the usage on the tenant spec:
status:
namespaces:
- green-prod
- green-test
quota:
hard:
limits.cpu: "2"
limits.memory: 2Gi
pods: "6"
requests.cpu: "1"
requests.memory: 1Gi
used:
limits.cpu: 400m
limits.memory: 1Gi
pods: "2"
requests.cpu: 200m
requests.memory: 256Mi
Description
This PR adds two custom metrics for capsule tenants:
capsule_tenant_resource_limit{resource="<resource>",resourcequotaindex="<index>",tenant="<tenant>"}
capsule_tenant_resource_usage{resource="<resource>",resourcequotaindex="<index>",tenant="<tenant>"}
Usecase
When resourcequotas are configured via capsule at the Tenant scope, capacity planning is difficult via Prometheus metrics from i.e. kube-state-metrics, since the sum of the resourcequotas is not actually what's being enforced. Instead we can provide metrics that expose the aggregated resource limits and usage for a tenant.
Example metrics
Tenant Resource
```yaml apiVersion: capsule.clastix.io/v1beta2 kind: Tenant metadata: name: test spec: owners: - name: alice kind: User namespaceOptions: quota: 10 resourceQuotas: scope: Tenant items: - hard: pods: 100 - hard: limits.memory: 4Gi requests.memory: 4Gi - hard: requests.memory: 6Gi ```Metrics
```yaml # HELP capsule_tenant_resource_limit Current resource limit for a given resource in a tenant # TYPE capsule_tenant_resource_limit gauge capsule_tenant_resource_limit{resource="limits.memory",resourcequotaindex="1",tenant="test"} 4.294967296e+09 capsule_tenant_resource_limit{resource="namespaces",resourcequotaindex="",tenant="test"} 10 capsule_tenant_resource_limit{resource="pods",resourcequotaindex="0",tenant="test"} 100 capsule_tenant_resource_limit{resource="requests.memory",resourcequotaindex="1",tenant="test"} 4.294967296e+09 capsule_tenant_resource_limit{resource="requests.memory",resourcequotaindex="2",tenant="test"} 6.442450944e+09 # HELP capsule_tenant_resource_usage Current resource usage for a given resource in a tenant # TYPE capsule_tenant_resource_usage gauge capsule_tenant_resource_usage{resource="limits.memory",resourcequotaindex="1",tenant="test"} 2.68435456e+09 capsule_tenant_resource_usage{resource="namespaces",resourcequotaindex="",tenant="test"} 4 capsule_tenant_resource_usage{resource="pods",resourcequotaindex="0",tenant="test"} 20 capsule_tenant_resource_usage{resource="requests.memory",resourcequotaindex="1",tenant="test"} 2.68435456e+09 capsule_tenant_resource_usage{resource="requests.memory",resourcequotaindex="2",tenant="test"} 2.68435456e+09 ```