Closed jtorrex closed 1 year ago
It looks like batch/job
is dropped. Can you update controller_manager_config like this?
apiVersion: config.kueue.x-k8s.io/v1beta1
kind: Configuration
health:
healthProbeBindAddress: :8081
metrics:
bindAddress: :8080
webhook:
port: 9443
leaderElection:
leaderElect: true
resourceName: c1f6bfd2.kueue.x-k8s.io
controller:
groupKindConcurrency:
Job.batch: 5
LocalQueue.kueue.x-k8s.io: 1
ClusterQueue.kueue.x-k8s.io: 1
ResourceFlavor.kueue.x-k8s.io: 1
Workload.kueue.x-k8s.io: 1
clientConnection:
qps: 50
burst: 100
#waitForPodsReady:
# enable: true
#manageJobsWithoutQueueName: true
#namespace: ""
#internalCertManagement:
# enable: false
# webhookServiceName: ""
# webhookSecretName: ""
integrations:
frameworks:
- "kubeflow.org/mpijob"
+ - "batch/job"
/close user error :)
@alculquicondor: Closing this issue.
What happened:
nodeLabels
defined in each ResourceFlavor will be propagated to the.spec.template.spec.nodeSelector
field for any Pod when is executed in the same queue.nodeLabel
defined and linked to the ClusterQueue, and running jobs in this queue, the nodeLabels seem that are not propagated to the.spec.template.spec.nodeSelector
of each Pod.What you expected to happen:
nodeLabels
defined on the ResourceFlavor should appear on the nodeSelector field of the Pods.How to reproduce it (as minimally and precisely as possible):
<none>
Anything else we need to know?:
Environment:
Kubernetes version (use
kubectl version
):Kueue version (use
git describe --tags --dirty --always
): Tested both on: https://github.com/kubernetes-sigs/kueue/releases/tag/v0.3.2 and https://github.com/kubernetes-sigs/kueue/releases/tag/v0.3.1Cloud provider or hardware configuration: AWS EKS
OS (e.g:
cat /etc/os-release
):uname -a
): Kernel Version: 5.4.226-129.415.amzn2.x86_64Controller configuration modified to allow the MPIJob framework (controller_manager_config.yaml):
Karpenter patch to avoid interfering with other namespaces when the cluster downscales (kueue-karpenter-patch.yaml):