kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.61k stars 1.63k forks source link

Standalone spark application as a component in a pipeline #4224

Closed Hmr-ramzi closed 4 years ago

Hmr-ramzi commented 4 years ago

What steps did you take:

[A clear and concise description of what the bug is.] A standalone installation spark was wrapped in a docker image and used a component in one of the steps of a pipeline

What happened:

The pipeline step failed with secrets is forbidden: User "system:serviceaccount:kubeflow:pipeline-runner" cannot create resource "secrets" in API group "" in the namespace "kubeflow".

What did you expect to happen:

Able to create secrets (maybe it is related to this issue ? https://github.com/kubeflow/pipelines/issues/2623)

Environment:

How did you deploy Kubeflow Pipelines (KFP)? within Kubeflow deployment on kubernetes

KFP version:

KFP SDK version:

Anything else you would like to add:

A general discussion on how to use spark application as one of the steps in a pipeline

[Miscellaneous information that will assist in solving the issue.]

/kind bug

Bobgy commented 4 years ago

Thanks for the report!

However, KFP isn't opinionated about what permission you give to pipeline-runner. The role has never been added: https://github.com/kubeflow/pipelines/commits/master/manifests/kustomize/base/pipeline/pipeline-runner-role.yaml.

You are recommended to add more RBAC permission to pipeline-runner service account, refer to https://kubernetes.io/docs/reference/access-authn-authz/rbac/.

Hmr-ramzi commented 4 years ago

@Bobgy Thank you for your reply. Do you know if it is possible or if there is a workaround to use spark from within a pipeline in kubeflow ?

Bobgy commented 4 years ago

Sorry, I'm not sure. /cc @Ark-kun @numerology @hongye-sun Do you know if there's an existing component for using spark?

@Hmr-ramzi even if there isn't already one, you can easily componentize your usual commands to start a spark job into a KFP component.

Hmr-ramzi commented 4 years ago

thats what i tried to do and it reached to this kind of error that i opened this issue for (the error related to the pipeline-runner does not have rights to create secrets ) can you refer me to what you propose with starting a spark job in a component ?

Bobgy commented 4 years ago

@Hmr-ramzi you can fix the error by yourself:

You are recommended to add more RBAC permission to pipeline-runner service account, refer to https://kubernetes.io/docs/reference/access-authn-authz/rbac/.

These are cluster-specific permission settings.

numerology commented 4 years ago

Do you know if there's an existing component for using spark?

Not for standalone spark installation AFAIK, but we do have one running spark in Dataproc.

https://github.com/kubeflow/pipelines/tree/master/components/gcp/dataproc

Ark-kun commented 4 years ago

the error related to the pipeline-runner does not have rights to create secrets

You need to give that permission to pipeline-runner by modifying the Kubernetes Role: https://github.com/kubeflow/pipelines/blob/0b8d2e12d1bb79e8fc2b3ea2cfc69c99fadae2a0/manifests/kustomize/base/pipeline/pipeline-runner-role.yaml

Hmr-ramzi commented 4 years ago

@Ark-kun Thank you for your reply. Do you think it makes sense that i try to contribute by providing a variable in the manifest that can allow permissive or restrictive abilities to the pipeline runner ?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.