kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.28k stars 225 forks source link

Workloads corresponding to Jobs deleted with --cascede=orphan still consume resources #1789

Open mimowo opened 5 months ago

mimowo commented 5 months ago

What happened:

Job deleted with --cascade=orphan continues to reserve ClusterQueue resources.

Note this is a "known" issue, which is a follow up to https://github.com/kubernetes-sigs/kueue/issues/1726

What you expected to happen:

Jobs deleted with --cascade=orphan should free the cluster resources.

How to reproduce it (as minimally and precisely as possible):

  1. Create a job with
    
    apiVersion: batch/v1

kind: Job metadata: name: sample-job namespace: default labels: kueue.x-k8s.io/queue-name: user-queue spec: parallelism: 3 completions: 3 suspend: true template: spec: containers:

Issue: The deleted job continues to reserve cluster queue resources:

kubectl get clusterqueue -oyaml returns:

...
  status:
    admittedWorkloads: 1
    conditions:
    - lastTransitionTime: "2024-03-04T10:41:42Z"
      message: Can admit new workloads
      reason: Ready
      status: "True"
      type: Active
    flavorsReservation:
    - name: default-flavor
      resources:
      - borrowed: "0"
        name: cpu
        total: "3"
      - borrowed: "0"
        name: memory
        total: 600Mi
    flavorsUsage:
    - name: default-flavor
      resources:
      - borrowed: "0"
        name: cpu
        total: "3"
      - borrowed: "0"
        name: memory
        total: 600Mi
    pendingWorkloads: 0
    reservingWorkloads: 1

Anything else we need to know?:

Environment:

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tenzen-y commented 2 months ago

/remove-lifecycle stale

I think that this is still a valid, known issue. I think that mentioning this in the documentation would be worth it.