kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.37k stars 248 forks source link

Support MPIJob managedBy feature for the MultiKueue #3257

Open tenzen-y opened 1 day ago

tenzen-y commented 1 day ago

What would you like to be added: If we want to use the current MPIJob MultiKueue feature, we need to uninstall the mpi-operator from the management cluster. But, since the mpi-operator v0.6.0, we started to support the managedBy feature similar to the batch/v1 Job.

Hence, we want to support it by implementing in the following:

  1. Implement the IsJobManagedByKueue: https://github.com/kubernetes-sigs/kueue/blob/4199c9dd9ce89636eb4e72f5cebb3c9adfba3f0c/pkg/controller/jobs/mpijob/mpijob_multikueue_adapter.go#L89-L91
  2. Implement the defaulting webhoooks similar to Job and JobSet:

And other needed implementations to support it if it exists.

Why is this needed: The MPIJob with managed feature allows us to install the mpi-operator to management and worker clusters so that we can easily introduce multi-cluster MPIJob dispatching and give the possibility of running the MPIJob in the management cluster as well.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

tenzen-y commented 1 day ago

cc: @mimowo @mszadkow

mszadkow commented 1 day ago

/assign