kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[feature] Improved User Isolation in Kubeflow Pipelines #8406

Open DomFleischmann opened 2 years ago

DomFleischmann commented 2 years ago

Feature Area

/area frontend /area backend /area sdk

What feature would you like to see?

Authenticated and Authorized Users should be isolated by namespaces and should not have access to other users artifacts, unless authorized. The solution should be handled in frontend, backend, object storage and sdk.

What is the use case or pain point?

The current implementation allows users to access other users artifacts, this is a big security risk and a feature that limits enterprise adoption.

Is there a workaround currently?

Distributions are doing their own workarounds or enterprise customers need to deploy separate clusters for different users, which is unefficient.

This is a Roadmap Item for Kubeflow 1.7 requested by the 1.7 Release Team.

@zijianjoy @juliusvonkohout @StefanoFioravanzo @jbottum @annajung @kimwnasptd

Love this idea? Give it a 👍.

jbottum commented 2 years ago

/priority p1

juliusvonkohout commented 2 years ago

I think the there are three main tasks. From here https://github.com/kubeflow/kubeflow/issues/6662 the listed main problems are

  1. "The pipeline UI allows reading other peoples artifacts. The artifact proxy in the user namespace is insecure and obsolete. In the UI you can just get the artifact link from another user, remove the ?namespace=xxx parameter at the end and the UI server will fake the corresponding user for you. So if you know the S3/GCS path you can read other guys artifacts."

  2. https://github.com/kubeflow/pipelines/pull/7725#issuecomment-1277334000

  3. The namespaced pipeline definitions will be implemented by Arrikto.

StefanoFioravanzo commented 2 years ago

Thanks for starting this issue! Looping in @elikatsis from our side as well

subasathees commented 1 year ago

@StefanoFioravanzo , @juliusvonkohout @DomFleischmann Hi Team, Any update on this feature, in kf v1.7.0 also we can see this is not implemented. any workaround for the same available now.

thesuperzapper commented 1 year ago

@subasathees artifacts are correctly isolated when using Kubeflow Pipelines on deployKF which is my new Kubeflow distribution that includes Kubeflow Pipelines.

deployKF achieves this isolation by using object prefixes with profile/namespace at the beginning, and assigning a unique IAM role for each profile.

There is also some crazy stuff going on to ensure the isolation of KFP V2 artifacts, but it all boils down to creating the ConfigMap/kfp-launcher in each profile namespace so that the defaultPipelineRoot is set to a different value for each profile.

However, deployKF is still limited by Kubeflow Pipelines putting all pipeline definitions under the pipelines/ object prefix (regardless of the profile/namespace).


Interestingly, the ?namespace= bypass described in https://github.com/kubeflow/pipelines/issues/8406#issuecomment-1299781389 does not work in deployKF because of a few factors:

thesuperzapper commented 1 year ago

@zijianjoy @james-jwu we really need to fix the ?namespace= parameter bypass described in https://github.com/kubeflow/pipelines/issues/8406#issuecomment-1299781389.

The bypass is that artifact auth is ignored when no namespace parameter is set. This is because when no namespace parameter is set, it uses the ml-pipeline-ui pod from the kubeflow namespace, rather than proxying to ml-pipeline-artifact in the profile namespaces (to which istio will control access based on the user-id header, with the AuthorizationPolicy).

I think the best option is to have the ml-pipeline-ui (KFP frontend pod), reject artifact requests that don't specify ?namespace=.

To do this, we would need to update this code to reject when no namespace parameter is found:

https://github.com/kubeflow/pipelines/blob/79d31db90610e1965b702b258805939962b9a773/frontend/server/handlers/artifacts.ts#L342-L378

juliusvonkohout commented 1 year ago

@zijianjoy @james-jwu we really need to fix the ?namespace= parameter bypass described in #8406 (comment).

The bypass is that artifact auth is ignored when no namespace parameter is set. This is because when no namespace parameter is set, it uses the ml-pipeline-ui pod from the kubeflow namespace, rather than proxying to ml-pipeline-artifact in the profile namespaces (to which istio will control access based on the user-id header, with the AuthorizationPolicy).

I think the best option is to have the ml-pipeline-ui (KFP frontend pod), reject artifact requests that don't specify ?namespace=.

To do this, we would need to update this code to reject when no namespace parameter is found:

https://github.com/kubeflow/pipelines/blob/79d31db90610e1965b702b258805939962b9a773/frontend/server/handlers/artifacts.ts#L342-L378

Can you create a PR?

juliusvonkohout commented 1 year ago

@thesuperzapper "However, deployKF is still limited by Kubeflow Pipelines putting all pipeline definitions under the pipelines/ object prefix (regardless of the profile/namespace)." https://github.com/kubeflow/pipelines/pull/7725 also fixes the /pipelines minio access. Users should anyway not access that path. Although the PR is outdated and might still have too much permissions to make the minio UI work more user friendly. But one can easily fix that.

@subasathees The namespaced pipeline definitions should be in 1.8 including the UI part. They are partially in 1.7.

All of this must be upstream. Having partial workarounds in downstream distributions is not a solution.

subasathees commented 1 year ago

@juliusvonkohout @thesuperzapper , Thanks for your detailed information, this will help.

thesuperzapper commented 1 year ago

@juliusvonkohout I am pretty focused on deployKF right now, so don't have much time.

The change to reject artifact requests without ?namespace should be relatively straightforward, but it could break stuff so we need to test carefully.

juliusvonkohout commented 1 year ago

@thesuperzapper we can also get rid of the per namespace artifact proxy and visualization server when doing this change. This would allow us to have zero overhead user namespaces. We just enforce ?namespace and use the already implemented direct way of ml-pipeline-ui to fetch artifacts from minio. Removing ?namespace from your query just uses that direct path by the way.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

juliusvonkohout commented 11 months ago

Definitely not stale

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

juliusvonkohout commented 8 months ago

/hold

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

juliusvonkohout commented 6 months ago

Not stale

juliusvonkohout commented 5 months ago

@zijianjoy @rimolive can you freeze the lifecycle of the Issue? It is still relevant.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

juliusvonkohout commented 3 months ago

/lifecycle frozen