microsoft / pai

Resource scheduling and cluster management for AI
https://openpai.readthedocs.io
MIT License
2.64k stars 548 forks source link

support different types of computing hardware #5138

Open hzy46 opened 3 years ago

hzy46 commented 3 years ago

Motivation

Currently, OpenPAI has supported the most widely used computing devices: Nvidia GPU, AMD GPU and CPU. In addition, it has the potential to support other types of device, e.g. AI computing chips (NPU).

Goal

Decouple OpenPAI services and specific hardware types. One OpenPAI service container can support a list of hardware types.

Requirements

For every type of computing device, the vendor should guarantee:

MVP with default scheduler

By assuming that there is only one type of computing device in a cluster, we could build a minimal viable solution with the default scheduler by

  1. configure ComputeDevice (default is nvidia.com/gpu) in deployment and record it in configmap
  2. add option to turn off HivdD scheduler in quick start
  3. bypass (or do other) pre-checks according to ComputeDevice in quick start
  4. chage nvidia.com/gpu to ComputeDevice in rest server
  5. change vc resource information when use default scheduler

https://github.com/microsoft/pai/blob/2fb370a59387f7df5e6cec9d30d194f3af19e2d9/src/rest-server/src/models/v2/job/k8s.js#L483-L487

Beside the necessary works, we (pai-dev team and device vendor) could make better support by

Perfect support with HiveD

By enabling HiveD, we could get better support

Some extra efforts must be done to achieve this

  1. offer a container runtime for every device type. Container runtime is a modified version of runc adding a custom pre-start hook to all containers. Here are two examples nvidia-container-runtime and runtime for AMD Radeon Open Compute
  2. describe machines and devices in layout.yaml #5151
  3. make sure HiveD config generation is independent of computing devices
  4. add appropriate environment variables in rest-server when generate pod spec in addition to NVIDIA_VISIBLE_DEVICES and PAI_AMD_VISIBLE_DEVICES.

https://github.com/microsoft/pai/blob/2fb370a59387f7df5e6cec9d30d194f3af19e2d9/src/rest-server/src/models/v2/job/k8s.js#L656-L676

Some optional work items include

hzy46 commented 3 years ago

Detailed Work Items for this issue:

If all P0 items are done, we can support different hardwares in default scheduler. If all P1 items are done, we can support different hardwares in hived scheduler. P2 items are nice-to-have.

hzy46 commented 3 years ago

Test cases for rest-server:

1. Default Scheduler: Test the resource requirement is correctly specified in pod definition.

machine-sku:
  master-machine: # define a machine sku
    # the resource requirements for all the machines of this sku
    # We use the same memory format as Kubernetes, e.g. Gi, Mi
    # Reference: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory
    mem: 60Gi
    cpu:
      # the number of CPU vcores
      vcore: 24
  gpu-machine:
    computing-device:
      type: a.b.com/c
      model: faked
      count: 4
    mem: 220Gi
    cpu:
      vcore: 24

machine-list:
  - hostname: pai-master # name of the machine, **do not** use upper case alphabet letters for hostname
    hostip: 10.0.0.1
    machine-type: master-machine # only one master-machine supported
    pai-master: "true"
  - hostname: pai-worker1
    hostip: 10.0.0.2
    machine-type: gpu-machine
    pai-worker: "true"
  - hostname: pai-worker2
    hostip: 10.0.0.3
    machine-type: gpu-machine
    pai-worker: "true"
………………

2. Hived Scheduler: Test the environment varibales is set in pod spec.