Open DomFleischmann opened 2 years ago
/priority p1
I think the there are three main tasks. From here https://github.com/kubeflow/kubeflow/issues/6662 the listed main problems are
"The pipeline UI allows reading other peoples artifacts. The artifact proxy in the user namespace is insecure and obsolete. In the UI you can just get the artifact link from another user, remove the ?namespace=xxx parameter at the end and the UI server will fake the corresponding user for you. So if you know the S3/GCS path you can read other guys artifacts."
https://github.com/kubeflow/pipelines/pull/7725#issuecomment-1277334000
The namespaced pipeline definitions will be implemented by Arrikto.
Thanks for starting this issue! Looping in @elikatsis from our side as well
@StefanoFioravanzo , @juliusvonkohout @DomFleischmann Hi Team, Any update on this feature, in kf v1.7.0 also we can see this is not implemented. any workaround for the same available now.
@subasathees artifacts are correctly isolated when using Kubeflow Pipelines on deployKF which is my new Kubeflow distribution that includes Kubeflow Pipelines.
deployKF achieves this isolation by using object prefixes with profile/namespace at the beginning, and assigning a unique IAM role for each profile.
There is also some crazy stuff going on to ensure the isolation of KFP V2 artifacts, but it all boils down to creating the ConfigMap/kfp-launcher
in each profile namespace so that the defaultPipelineRoot
is set to a different value for each profile.
However, deployKF is still limited by Kubeflow Pipelines putting all pipeline definitions under the pipelines/
object prefix (regardless of the profile/namespace).
Interestingly, the ?namespace=
bypass described in https://github.com/kubeflow/pipelines/issues/8406#issuecomment-1299781389 does not work in deployKF because of a few factors:
?namespace=
parameter to account for the case of a cached result being in a different namespace (and this happens to have the side effect of always forcing a the ?namespace=
parameter to be set)ml-pipeline-ui
pod from the kubeflow
namespace actually has a bug that prevents it from accessing the minio service (because in deployKF minio lives in a different namespace to kubeflow)@zijianjoy @james-jwu we really need to fix the ?namespace=
parameter bypass described in https://github.com/kubeflow/pipelines/issues/8406#issuecomment-1299781389.
The bypass is that artifact auth is ignored when no namespace parameter is set. This is because when no namespace parameter is set, it uses the ml-pipeline-ui
pod from the kubeflow
namespace, rather than proxying to ml-pipeline-artifact
in the profile namespaces (to which istio will control access based on the user-id header, with the AuthorizationPolicy).
I think the best option is to have the ml-pipeline-ui
(KFP frontend pod), reject artifact requests that don't specify ?namespace=
.
To do this, we would need to update this code to reject when no namespace parameter is found:
@zijianjoy @james-jwu we really need to fix the
?namespace=
parameter bypass described in #8406 (comment).The bypass is that artifact auth is ignored when no namespace parameter is set. This is because when no namespace parameter is set, it uses the
ml-pipeline-ui
pod from thekubeflow
namespace, rather than proxying toml-pipeline-artifact
in the profile namespaces (to which istio will control access based on the user-id header, with the AuthorizationPolicy).I think the best option is to have the
ml-pipeline-ui
(KFP frontend pod), reject artifact requests that don't specify?namespace=
.To do this, we would need to update this code to reject when no namespace parameter is found:
Can you create a PR?
@thesuperzapper "However, deployKF is still limited by Kubeflow Pipelines putting all pipeline definitions under the pipelines/ object prefix (regardless of the profile/namespace)." https://github.com/kubeflow/pipelines/pull/7725 also fixes the /pipelines minio access. Users should anyway not access that path. Although the PR is outdated and might still have too much permissions to make the minio UI work more user friendly. But one can easily fix that.
@subasathees The namespaced pipeline definitions should be in 1.8 including the UI part. They are partially in 1.7.
All of this must be upstream. Having partial workarounds in downstream distributions is not a solution.
@juliusvonkohout @thesuperzapper , Thanks for your detailed information, this will help.
@juliusvonkohout I am pretty focused on deployKF right now, so don't have much time.
The change to reject artifact requests without ?namespace
should be relatively straightforward, but it could break stuff so we need to test carefully.
@thesuperzapper we can also get rid of the per namespace artifact proxy and visualization server when doing this change. This would allow us to have zero overhead user namespaces. We just enforce ?namespace and use the already implemented direct way of ml-pipeline-ui to fetch artifacts from minio. Removing ?namespace from your query just uses that direct path by the way.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Definitely not stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/hold
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Not stale
@zijianjoy @rimolive can you freeze the lifecycle of the Issue? It is still relevant.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
Feature Area
/area frontend /area backend /area sdk
What feature would you like to see?
Authenticated and Authorized Users should be isolated by namespaces and should not have access to other users artifacts, unless authorized. The solution should be handled in frontend, backend, object storage and sdk.
What is the use case or pain point?
The current implementation allows users to access other users artifacts, this is a big security risk and a feature that limits enterprise adoption.
Is there a workaround currently?
Distributions are doing their own workarounds or enterprise customers need to deploy separate clusters for different users, which is unefficient.
This is a Roadmap Item for Kubeflow 1.7 requested by the 1.7 Release Team.
@zijianjoy @juliusvonkohout @StefanoFioravanzo @jbottum @annajung @kimwnasptd
Love this idea? Give it a 👍.