kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

Pipeline with volume does not work with "k8sapi" argo executor #4326

Closed nakfour closed 2 years ago

nakfour commented 4 years ago

What steps did you take:

As part of Open Data Hub we are trying to get KFP 1.0 working on Openshift 4.x. We must use the "k8sapi" executor since docker is not present. Provided KFP python examples work as long as there are no passing parameters. We are trying to convert the KFP examples with passing parameters to use volumes to get them to work on OCP. We tried one of the simple volume creation examples as pointed here (https://github.com/kubeflow/pipelines/blob/master/samples/core/volume_ops/volume_ops.py) This caused an error stating we need to use volumes or emtpyDir. Looking at the generated yaml file we notice these lines

outputs:
      parameters:
      - name: create-pvc-manifest
        valueFrom: {jsonPath: '{}'}
      - name: create-pvc-name
        valueFrom: {jsonPath: '{.metadata.name}'}
      - name: create-pvc-size
        valueFrom: {jsonPath: '{.status.capacity.storage}'}

These need to be placed in emptyDir or volumes. We modified the generated yaml file to exclude the output and that worked.

Is there a way we can use volumes or emptydir in KFP python without generating these output parameters?

Environment:

KFP 1.0 on OCP 4.4

Bobgy commented 4 years ago

/assign @Ark-kun @neuromage @chensun @numerology

Ark-kun commented 4 years ago

I think this is a know upstream issue: https://github.com/argoproj/argo/issues/2679

We must use the "k8sapi" executor since docker is not present.

The pns executor might be the more future-proof way.

As a workaround: All VolumeOp does is just pass the volume YAML to Argo and argo just calls kubectl create using that YAML. So, you could just create a custom component that uses kubectl create to create a volume. This way you control the outputs.

nakfour commented 4 years ago

@Ark-kun we documented why we are not using the pns executer here: https://github.com/opendatahub-io/odh-manifests/issues/117 The issue is with the DSL generated volume yaml that includes these lines as mentioned before

outputs:
      parameters:
      - name: create-pvc-manifest
        valueFrom: {jsonPath: '{}'}
      - name: create-pvc-name
        valueFrom: {jsonPath: '{.metadata.name}'}
      - name: create-pvc-size
        valueFrom: {jsonPath: '{.status.capacity.storage}'}

These lines are not generated by Argo, they are generated by the python DSL SDK when creating a volume. Is there a way to create a volume in python without any outputs that require EmptyDir?

Ark-kun commented 4 years ago

You might want to make sure your Kubernetes environment supports Argo. I was under the impression that several of our users have used OpenShift semi-successfully.

Provided KFP python examples work as long as there are no passing parameters.

I think this is a standard feature of Argo. Can you please check whether any of Argo's parameter or artifact passing examples work for your setup? Data passing is a core feature of Argo and KFP. Without making it work, the product might not be usable.

The issue is with the DSL generated volume yaml that includes these lines as mentioned before

I think this is a standard feature of Argo resource templates. You might want to make sure your Kubernetes environment supports Argo.

These need to be placed in emptyDir or volumes.

Can you explain what needs to be placed in the volumes?

We must use the "k8sapi" executor since docker is not present.

I think we had several of our users use KFP in non-docker environments. Argo developers propose PNS executor as the optimal one for the future.

We tried one of the simple volume creation examples

I'd consider the VolumeOp to be an experimental community-driven feature. Modifying global state (such as creating Kubernetes objects) is somewhat frowned upon compared to pure components that only use their inputs.

Is there a way to create a volume

You can use kubectl create .... You can create container component that runs kubectl create .... In fast that's what VolumeOp and Argo do.

without any outputs that require EmptyDir

I'm not sure I understand how EmptyDir comes into play here. Can you please help me understand.

Ark-kun commented 4 years ago

I think the best course of action is to first try to make the underlying orchestrator (Argo) work for you.

Ark-kun commented 4 years ago

I think it might be best for you to avoid VolumeOp for now. Try making the containers and data passing work. If the system throws error asking to create emptyDir volumes under the output paths, then it might be possible to post-process the compiled workflow to add them. We've suggested to the Argo developers that they auto-mount the emptyDir volumes if Argo needs them for operation.

nakfour commented 4 years ago

@Ark-kun yes Argo works, we have been using it in our production cluster actually for quite some time. The key is to use volumes in passing parameters, once that is done, it all works. We are trying to find a way to get KFP python code work by allowing users to create volumes in the python code, instead of telling them dooc create pv first then write your python KFP code.

nakfour commented 4 years ago

@Ark-kun is it possible to even use emptyDir from python KFP?

nakfour commented 4 years ago

@Ark-kun thanks for component suggestion in the KFP weekly for a component to create a volume from python. Is it possible to collaborate on creating this feature?

eterna2 commented 4 years ago

Hey @nakfour

I am trying to setup kubeflow in our openshift cluster too. I would n't mind helping out too.

Would be cool if u can tag me also for any openshift related issue too? (So I can watch out for it as I am still at a very early stage of getting kf to work on our cluster).

Ark-kun commented 4 years ago

yes Argo works

I was interested whether the whole Argo works (e.g. whether the artifact and parameter passing examples work). Data passing is a core Argo feature that distinguishes it from many other orchestrators.

We are trying to find a way to get KFP python code work by allowing users to create volumes in the python code, instead of telling them dooc create pv first then write your python KFP code.

We're somewhat discouraging volumes due to them often making components non-reusable (make them depend on the cluster state). Also, volumes a global (like global variables) which makes components less pure and less reproducible. We also discourage volume creation as part of the pipeline, because it's affecting global state. If some pipeline task deletes a volume while some future task will need it, there would be problems.

is it possible to even use emptyDir from python KFP?

Every KFP task has task.add_volume() and task.add_volume_mount(). They support all Kubernetes volumes including emptyDir.

The key is to use volumes in passing parameters, once that is done, it all works.

AFAIK, for container components/templates mounting emptyDir volumes under the outputs should work around this Argo issue.

BTW, We've developed an experimental way to switch KFP's data-passing from Argo's artifact repository to Volumes. Please check https://github.com/kubeflow/pipelines/blob/237795539f7b85bac77435e2464367226ee19391/sdk/python/kfp/dsl/_pipeline.py#L157

        from kfp.dsl import PipelineConf, data_passing_methods
        from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim
        pipeline_conf = PipelineConf()
        pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
            volume=V1Volume(
                name='data',
                persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
            ),
            path_prefix='artifact_data/',
        )
nakfour commented 4 years ago

@Ark-kun wanted to see if there us any update on this issue? Thanks

elikatsis commented 4 years ago

Hi @nakfour, this issue had totally slipped from me!

If your only problem using VolumeOps is the outputs, then you can disable them in your DSL code as follows:

vop = kfp.dsl.VolumeOp(...)
vop.outputs = {}

If this helps you, we could implement an argument in the constructor to disable outputs and all that should be required will be:

vop = kfp.dsl.VolumeOp(..., disable_outputs=True)

We also discourage volume creation as part of the pipeline, because it's affecting global state. If some pipeline task deletes a volume while some future task will need it, there would be problems.

Creating resources as part of the pipeline is essential to ML and there are components and samples that actually do similar things. If I'm not wrong, the taxi cab example creates deployments and services using pure components, doesn't it? The bad thing about these components is that users don't really have a say on the resources created :smile: Creating volumes is similar to that and comes with lots of benefits (I'll omit them since this issue isn't a place to discuss about them).

Regarding the experimental data passing method, this code is wrong:

pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=V1Volume(
        name='data',
        persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
    ),
    path_prefix='artifact_data/',
)

The correct way to do it is

pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=V1Volume(
        name='data',
        persistent_volume_claim=V1PersistentVolumeClaimVolumeSource(claim_name='data-volume'),
    ),
    path_prefix='artifact_data/',
)

But all of this is too much boilerplate code. You can always use a PipelineVolume which makes things much easier since it is a V1Volume with some extra magic:

pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=kfp.dsl.PipelineVolume(pvc="data-volume"),
    path_prefix='artifact_data/',
)

However, this data passing method is not easy to maintain. All your pipelines will be using the same PVC, or you will have your users ask which PVC to use every time they want to run a pipeline. And what size should these PVCs have? How can you know how much data will your users be passing around? Creating PVCs using VolumeOps in the beginning of the pipeline and (optionally) deleting them in the end seems like the easiest, fastest, and most sane way to go.

zsaladin commented 4 years ago

In my case I need the outputs.parameters. Because other containers depend on the pvc name.

So I changed my codes.

Old Codes

volume = dsl.VolumeOp(
    name="volume",
    resource_name="volume",
    modes=dsl.VOLUME_MODE_RWO,
    size="1Gi"
)
merge_op = func_to_container_op(merge)
merge_task = merge_op()
merge_task.apply(
    mount_pvc(
        pvc_name=volume .volume.persistent_volume_claim.claim_name,
        volume_mount_path="/mnt"
    )
)

New Codes

volume = dsl.VolumeOp(
    name="volume",
    resource_name="volume",
    modes=dsl.VOLUME_MODE_RWO,
    size="1Gi"
)
volume.outputs = {}  # To exclude outputs

merge_op = func_to_container_op(merge)
merge_task = merge_op()
merge_task.apply(
    mount_pvc(
        pvc_name="{{workflow.name}}-volume",  # Hard coded name instead of outputs.parameter 
        volume_mount_path="/mnt"
    )
)

The name of pvc consists of workflow.name and volume.name. This one works.

nakfour commented 3 years ago

Thanks @elikatsis I will try your comment on disabling the outputs, if that works and we can create volumes with VolumeOp then we should be good. @zsaladin also thanks for the code example

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.