kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.61k stars 1.63k forks source link

[Question] Any way to automatically (by default) add secret to pipeline's pods? #3812

Closed anatolii-zolotar closed 4 years ago

anatolii-zolotar commented 4 years ago

Hi. I need someone's advice for solving our problem with access to minio artifacts from pipeline pods. I was about to post a question about particular tech workaround, but it's probably better to describe whole case since it seems to me that I might be thinking in a completely wrong direction.

We want to be able to experiment with individual components of pipeline and provide existing artifacts as input (without running previous steps or with any specific artifact). Right now, as far as I understand (I'm describing this part on behalf of my colleague), the type of files, compiled by other components, is InputPath. When we want to provide artifact url as a parameter for component, KFP creates temp file and writes this value in it. In component's script we have a method which can figure out whether input file contains data or minio url.

We want to read minio artifacts, but first we need to get access/secret keys from Secret. This can done in pipeline code by adding kfp.dsl.get_pipeline_conf().add_op_transformer(use_minio_secret()) string where we get keys from Secret and add them to env variables.

The main concert here is that we would like to make work with pipelines more smooth for our users and it's might be nice to not force them to use command above, but add minio secrets by default.

There are 2 possible solutions I can see: 1) automatically modify all pipelines' pods in order to add minio keys from secret, by default 2) use custom labels on pods + PodDefault (see https://github.com/kubeflow/kubeflow/blob/master/components/admission-webhook/README.md#how-this-works). This works for Jupyter Notebooks where custom labels can be added to pod manually.

In both cases, whether we want to add env variables from secret or custom label to pod, we need to do it for all pipeline pods, by default. Does anyone have any idea how it can be done? I couldn't figure out whether it's possible to kustomize some manifest to achieve that.

KFP version: 0.5.0

/kind question

Bobgy commented 4 years ago

@Ark-kun @numerology probably knows more. As far I as know, this is not possible now. KFP sdk is responsible for converting pipeline definition to a runnable argo workflow spec, then the spec will just get run without modification.

An idea to make that possible, let all pods from KFP get a special label added (maybe we can expose configuration to users), so that you can use the PodDefault admission webhook to modify all.

We want to be able to experiment with individual components of pipeline and provide existing artifacts as input (without running previous steps or with any specific artifact). Right now, as far as I understand (I'm describing this part on behalf of my colleague), the type of files, compiled by other components, is InputPath. When we want to provide artifact url as a parameter for component, KFP creates temp file and writes this value in it. In component's script we have a method which can figure out whether input file contains data or minio url.

However, if going back to your original use-case, I think you'd want to check out https://www.kubeflow.org/docs/pipelines/caching/ first. It was a new feature in KFP.

Ark-kun commented 4 years ago

it's probably better to describe whole case since it seems to me that I might be thinking in a completely wrong direction.

Thank you. Having the full context is crucial.

without running previous steps

We think that the caching feature should make this a non-issue. The previous steps are just skipped when you re-submit a pipeline.

In component's script we have a method which can figure out whether input file contains data or minio url.

I think this can be pretty fragile. The artifact repository is considered to be a black box and some users configure it to use GCS, S3 or OSS instead of Minio. I'm not sure it's a good idea to access it from components.

PodDefault

It might also be possible to use the podpreset for this.

//to be continued...

Ark-kun commented 4 years ago

The main concert here is that we would like to make work with pipelines more smooth for our users and it's might be nice to not force them to use command above, but add minio secrets by default. In both cases, whether we want to add env variables from secret or custom label to pod, we need to do it for all pipeline pods, by default.

Let me try to understand the issue better. If I understand correctly, you know how to solve the issue by making pipeline-level customization, but you'd like to move that to, say, cluster-level. Is that correct? Maybe the PodPreset can help here.

A fully-supported workaround for passing an explicit artifact to a component is to first explicitly upload that artifact to an external storage (e.g. GCS) using a component like "Upload to GCS" and then you can use "Download from GCS" to download the artifact and pass it to the component you're testing. The caching feature will make it so that the downloader only runs once and is skipped on subsequent runs. This way you do not need to do any manual downloading from Minio. I think this solution is significantly easier than what you're doing with Minio.

P.S. We had some thoughts about adding support for passing existing artifacts as inputs, but we have concerns about exposing the artifacts and about portability of the approach.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.