numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.08k stars 111 forks source link

Is it possible to support DRA(Dynamic Resource Allocation)? #2007

Closed sesame0224 closed 2 weeks ago

sesame0224 commented 3 weeks ago

Summary

In recent years, accelerators have become essential for data processing infrastructure. Is there a plan to support accelerators in the numaflow project?

Recently, DRA has been proposed as a mechanism to easily use accelerators in Kubernetes pods. DRA allocates accelerator resources to pods using a manifest file.

In the following example, a fictional resource driver is defined as a "DeviceClass," and the spec and some metadata for the fictional resources (e.g., GPU, FPGA) managed by this driver are defined as a "ResourceClaimTemplate."

Then, in the pod manifest, a request for the allocation of fictional accelerator resources is made via a "ResourceClaim."

apiVersion: resource.k8s.io/v1alpha3
kind: DeviceClass
name: resource.example.com
spec:
  selectors:
  - cel:
      expression: device.driver == "resource-driver.example.com"
---
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaimTemplate
metadata:
  name: large-black-cat-claim-template
spec:
  spec:
    devices:
      requests:
      - name: req-0
        deviceClassName: resource.example.com
        selectors:
        - cel:
           expression: |-
              device.attributes["resource-driver.example.com"].color == "black" &&
              device.attributes["resource-driver.example.com"].size == "large"              
–--
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-cats
spec:
  containers:
  - name: container0
    image: ubuntu:20.04
    command: ["sleep", "9999"]
    resources:
      claims:
      - name: cat-0
  - name: container1
    image: ubuntu:20.04
    command: ["sleep", "9999"]
    resources:
      claims:
      - name: cat-1
  resourceClaims:
  - name: cat-0
    resourceClaimTemplateName: large-black-cat-claim-template
  - name: cat-1
    resourceClaimTemplateName: large-black-cat-claim-template

In this case, resources are allocated to the pod if the device's attributes match, such as having a black color and a large size.

A more practical example can be found here.

With the use of DRA, is it possible that UDFs in numaflow will support the use of accelerator resources?

Use Cases

In my team, we are researching and developing software technologies that control “data processing pipelines” leveraging accelerators to achieve efficient data processing infrastructure according to some use cases.

Our data processing pipeline is the entire process that the user wants to execute (e.g., preprocessing → inference → postprocessing), and this pipeline is composed of individual processes (pods using Kubernetes CRDs) executed on accelerators.

Our software control technology requires defining the accelerators for each process in the pipeline and the connections between them with a manifest.

We are considering leveraging numaflow. because numaflow seems highly affinity with our technology, in that numaflow define the entire process as pipeline format with a series of connected pods in a user-friendly.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

whynowy commented 3 weeks ago

Will be supported in the coming 1.3.1 release.

whynowy commented 2 weeks ago

@sesame0224 - just one thing to be aware, resourceClaimTemplateName in resourceClaims is too new (which was just released in k8s v1.31 2 weeks ago), our strategy is to stay couples of versions behind (at least one, v1.29 at this moment), so you will not be able to use resourceClaimTemplateName for now, but you should be able to use other properties defined in v1.29, such as name and source.

sesame0224 commented 2 weeks ago

@whynowy Thank you, I saw #2009. Thank to this, I understand that it is possible to use DRA in v1.29 with "numaflow", but resources such as ResourceClass, ClaimParameters, and ResourceClaimTemplate need to be deployed separately Is that correct?

whynowy commented 2 weeks ago

@whynowy Thank you, I saw #2009. Thank to this, I understand that it is possible to use DRA in v1.29 with "numaflow", but resources such as ResourceClass, ClaimParameters, and ResourceClaimTemplate need to be deployed separately Is that correct?

This is right.

sesame0224 commented 2 weeks ago

@whynowy Thank you. May I ask one more question about the release plan? Could I assume that support of numaflow for DRA in v1.31 will be around when K8s v1.33 is released, which is April next year? Because K8s v1.31 was released three weeks ago and the release interval is approximately four months.

whynowy commented 2 weeks ago

@whynowy Thank you. May I ask one more question about the release plan? Could I assume that support of numaflow for DRA in v1.31 will be around when K8s v1.33 is released, which is April next year? Because K8s v1.31 was released three weeks ago and the release interval is approximately four months.

Hopefully it will be earlier than that, we will upgrade to support v1.31 before v1.33 is released.

sesame0224 commented 2 weeks ago

I appreciate your replying.