kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.63k stars 1.63k forks source link

[feature] Support dataSource Field for PVC Creation in KFP Python SDK #11420

Open leseb opened 15 hours ago

leseb commented 15 hours ago

Feature Area

/area backend /area sdk

What feature would you like to see?

Requesting support for the dataSource field when creating a PersistentVolumeClaim (PVC) using the KFP Python SDK DSL container_component CreatePVC. This feature would enable users to create PVCs with pre-populated data, aligning with Kubernetes capabilities for cloning or restoring PVCs from existing volumes or snapshots.

What is the use case or pain point?

The dataSource field is essential for workflows that depend on pre-initialized volumes, such as restoring a snapshot for processing or cloning an existing volume for parallel workflows.

Is there a workaround currently?

Use a custom component task that invokes the kubernetes library and creates a PVC with a dataSource from a VolumeSnapshot.

Proposed changes

Extend the PVC creation API in the KFP Python SDK to include the optional dataSource parameter, reflecting the Kubernetes PersistentVolumeClaimSpec.

pvc = CreatePVC(
    pvc_name_suffix="foo",
    access_modes=["ReadWriteOnce"],
    size="100Gi",
    data_source={"api_group": "snapshot.storage.k8s.io", "kind": "VolumeSnapshot", "name": "my-snap"},
)

Love this idea? Give it a 👍.

leseb commented 9 hours ago

@HumairAK FYI