kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.81k stars 3.87k forks source link

VPA doesn't provide any recommendations when CronJob is in OOMKill crash loop #6236

Open FilimonovEugene opened 8 months ago

FilimonovEugene commented 8 months ago

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: 1.0.0

What k8s version are you using (kubectl version)?:

v1.25.14-eks-f8587cb

What environment is this in?:

AWS EKS

What did you expect to happen?:

VPA should track CronJob OOMKilled events and adjust resources requests and limits.

What happened instead?:

VPA doesn't react to job OOMKilled events

How to reproduce it (as minimally and precisely as possible):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: oomkilled
spec:
  schedule: "* * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 3
  jobTemplate:
    metadata:
      labels:
        app: oomkilled
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
          - image: gcr.io/google-containers/stress:v1
            name: stress
            command: [ "/stress"]
            args: 
              - "--mem-total"
              - "10485800000"
              - "--logtostderr"
              - "--mem-alloc-size"
              - "100000"
            resources:
              requests:
                memory: 100Mi
                cpu: 5m
              limits:
                memory: 1Gi

---

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: oomkilled
spec:
  targetRef:
    apiVersion: "batch/v1"
    kind: CronJob
    name: oomkilled
  updatePolicy:
    updateMode: "Initial"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 50m
          memory: 10Mi
        maxAllowed:
          cpu: 1
          memory: 2Gi
        controlledResources: ["cpu", "memory"]
wu0407 commented 8 months ago

Perhaps due to the short pod time, the metrics server did not obtain monitoring data. any related log or is more detail required?

andrii-litvinov commented 8 months ago

That can definitely be a reason, also I would expect VPA to react on OOMKilled events and increase the memory target recommendation for the next pod start.

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

andrii-litvinov commented 5 months ago

/remove-lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

andrii-litvinov commented 2 months ago

/remove-lifecycle stale