kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.47k stars 262 forks source link

TAS: Update cache on delete Topology. #3615

Open mbobrovskyi opened 3 days ago

mbobrovskyi commented 3 days ago

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fixed bug that doesn't allow to update cache on delete Topology.

Which issue(s) this PR fixes:

Fixes #3614

Special notes for your reviewer:

Does this PR introduce a user-facing change?

TAS: Fixed bug that doesn't allow to update cache on delete Topology.
k8s-ci-robot commented 3 days ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot commented 3 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mbobrovskyi Once this PR has been reviewed and has the lgtm label, please assign mimowo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes-sigs/kueue/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
mbobrovskyi commented 3 days ago

/test all

netlify[bot] commented 3 days ago

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
Latest commit 3c8e77f5f0aa806a968c79dd933cf97fb5c8b14e
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67444c77e83ec70008a24c53
mimowo commented 12 hours ago

@mbobrovskyi please split this PR into two: for delete and update. I think for delete we are almost good to go, for update I need to rethink if we need this complication. And I'm not sure about race conditions for the workload usage, so maybe we just block updating.

mimowo commented 11 hours ago

Then, we could cherry-pick probably handling of delete.

As for updates I think it is better to block updates for now to reflect the status quo that updates are not supported. Then, we could support updates and relax validation when we are confident about the solution.

So I propose:

  1. minimal PR fix for deletes (can be cherry-picked cause deleting is broken anyway)
  2. PR to block topology updates for .spec.levels field by validation on the topology API (can be considered for cherry-picking, because updating is broken anyway)
  3. PR to relax validation for the spec.levels field to allow validation when we are confident the solution works (would be a new feature)
mbobrovskyi commented 11 hours ago

/test all

mbobrovskyi commented 10 hours ago

/retest

Due to #3626