ironcore-dev / ironcore

Cloud Native Infrastructure as a Service
https://ironcore-dev.github.io/ironcore
Apache License 2.0
25 stars 4 forks source link

Enable Extensible and Policy-rich Scheduling for Machines #645

Open hardikdr opened 1 year ago

hardikdr commented 1 year ago

Summary

This issue aims to enable extensible and policy-rich scheduling for VirtualMachines and MetalMachines in the onmetal-api. The proposed approach is to introduce the capability to plug-in and plug-out custom schedulers in the cluster that can act on specific sets of VirtualMachines.

There are mainly 2 objectives to implement,

  1. To achieve this, the first action item is to introduce the Machine.Spec.SchedulerName API. This will allow users to specify which scheduler should be used to dispatch a particular machine. If not specified, the default scheduler will be used. Similar to the Pod.Spec.SchedulerName API in Kubernetes, this will provide greater flexibility and customization options for scheduling VirtualMachines.

  2. The second action item is to enhance the MachinePool status to include utilization information about the host in terms of CPU, memory, and hugepages, among other parameters. The utilization information will be similar to that of the node API in Kubernetes. This will provide valuable information to the custom schedulers, enabling them to make better scheduling decisions that are more aligned with the needs of the applications running on the VirtualMachines. For example, a custom scheduler could be designed to allocate VirtualMachines to hosts that have more available resources, ensuring that the applications running on those VirtualMachines are not resource-starved and perform optimally.

Basic example

  1. Enhance Machine API:

    apiVersion: compute.api.onmetal.de/v1alpha1
    kind: Machine
    metadata:
    name: machine-hd4
    spec:
    schedulerName: default-scheduler
  2. Enhance MachinePool API:

    status:
    allocatable:
      cpu: "48"
      ephemeral-storage: "1416167347928"
      hugepages-1Gi: 400Gi
      hugepages-2Mi: "0"
    capacity:
      cpu: "48"
      ephemeral-storage: 1536639920Ki
      hugepages-1Gi: 400Gi
      hugepages-2Mi: "0"

Motivation

To enable more efficient scheduling for the Machines.

hardikdr commented 1 year ago

cc @gehoern @adracus