ClusterConfiguration should support tolerations

cfchase commented 10 months ago

ClusterConfiguration should support tolerations

Run Ray clusters (especially the worker pods) on tainted nodes.

Description of Problem the Feature Should Solve

You cannot create a Ray cluster with tolerations using the CodeFlare SDK cluster = Cluster(ClusterConfiguration(...))

Often, machine nodes are tainted to prevent unwanted workloads. This is especially the case in GPU nodes which are often tainted. In addition different nodes will have different sized gpus, which would also use taints to make sure the correct workers land on the correct nodes.

You might also want to add a toleration to the headGroupSpec

Describe the Solution You Would Like to See

Add worker_tolerations and head_tolderations as optional parameters for ClusterConfiguration

cluster = Cluster(ClusterConfiguration(
    head_tolerations=[key, operator, effect],
    worker_tolerations=[key, operator, effect]
))

Describe Alternatives You Have Considered

Editing the yaml file and just using kuberay directly. You can currently manually edit an AppWrapper yaml to include a toleration for these taints.

workerGroupSpecs:
  - spec:
        tolerations:
          - key: nvidia.com/gpu
            operator: Exists
            effect: NoSchedule

cfchase commented 3 months ago

Since I've submitted this, I've come across the need to add other fields, such as nodeSelectors, schedulerName, annotations, etc to my ray clusters. As I don't think we can anticipate all the needs of the user, I think we should build in more flexibility into cluster creation. I'd like to be able to read the ray cluster specs and submit a python dictionary to update the ray cluster.

For example

update_dict = {
    "spec": {
        "workerGroupSpecs": [
            {
                "groupName": "small-group-raycluster",
                "template": {
                    "spec": {
                        "tolerations": [
                            {
                                "key": "nvidia.com/gpu",
                                "operator": "Exists",
                                "effect": "NoSchedule"
                            }
                        ]                        
                    }
                }
            }
        ]
    }
}

That way I could update not just tolerations, but any other field I need to change, such as nodeSelector or schedulerName

astefanutti commented 3 months ago

I totally agree the current API design is not flexible enough.

I really like your suggestion to offer the option for users to provide their own patch.

We could try to be in line with the different patching mechanisms that are well known in Kubernetes already: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/.

astefanutti commented 3 months ago

We could add two keyword arguments, one to provide the Pod template for the worker nodes, another one for the head node Pod template, using official Kubernetes Python client types so auto-complete works, e.g.:

from kubernetes import V1PodTemplateSpec, V1PodSpec, V1Toleration

cluster = Cluster(ClusterConfiguration(
    num_workers=N,
    worker_template=V1PodTemplateSpec(
        spec=V1PodSpec(
            tolerations=[V1Toleration(
                key="nvidia.com/gpu",
                operator="Exists",
                effect="NoSchedule",
            )],
            node_selector={
                "nvidia.com/gpu.present": "true",
            },
        )
    ),
    head_template=V1PodTemplateSpec(...),
))

These types from https://github.com/kubernetes-client/python provides a to_dict method that we could use to merge these user-provided templates.

@cfchase let us know if that's close enough to what you had in mind.

project-codeflare / codeflare-sdk