Closed IronPan closed 3 years ago
Ideally this should be implemented in a way that get Kubeflow Pipeline closer to support multi-user. E.g. launch workflow in arbitrary namespace
What's the priority of this?
How does this align with the broader plans in Kubeflow to support multiple users?
This is not yet being prioritized, although I think this deserve a high priority.
In addition to admin/user isolation, here is a list of items to achieve the full multi-user support for KFP
Some references for implementing multi-user support on-prem https://docs.google.com/document/d/1JbYndTaUwRyr4wrU13TN5fzMpnLCUtpehPB4tQnMuSM/edit#heading=h.xq5kl0qs27mm https://github.com/kubeflow/kubeflow/issues/3096
@jessiezcc Any update on this work? Do you think this is something that will get done in Q3 and thus be part of 0.7?
This work is not currently scheduled for Q3.
Some customers express the interests of having ACL for API. e.g. lock down the API for deleting the resource to admin.
/cc @krishnadurai
/cc @songole
Hi @IronPan. We (Arrikto) have been exploring this problem for the past month and we generally agree with your overview of the steps required to have multi-user functionality in pipelines.
I'm assigning this to me, we have made good progress and we should have initial support for multi-user pipelines in v0.7.
/assign @yanniszark
@yanniszark just curious, is there a design or plan for what this functionality might look like in 0.7 that you could share? We have been eagerly awaiting multi-user support in KFP and would love to review and give any feedback (assuming you'd want some).
Would it work if one would simply change the Client object so that the namespace can optionally be provided at instantiation?
I'm not sure what happens with the generated API's but it kind of looks like (assuming the cluster config is ok for the namespace) it would work. Sadly, I don't understand enough of this.
Given the option to choose the namespace where the pipeline should run, would be a good start. There would be at least some separation and it would be easier to manage resources and cost for multiple teams. Ideally, the Client would be instantiated with the namespace that is chosen in the UI.
@yanniszark Is there any work in progress on this that could be shared?
We will have some design doc ready in the following weeks and reviews and feedback are very much welcome then. Thanks
Hi @danielnorberg! Thanks for your interest in multi-user pipelines. We have actually made a lot of progress and presented a demo at the Kubeflow Community Meeting. Our design has been reviewed and validated by many end-users and we are working with the Pipelines team to iron out all the details. A design doc will soon follow.
Slides: https://docs.google.com/presentation/d/1fj0YM4LdToYY8cWSFUViTn_1t63Twm70QwG0cV0CX1Q A video recording will soon be available.
@yanniszark waiting for this feature. currently not able to run pipelines from jupyter notebook as pipeline exist in kubeflow namespace and not in notebook's. need to copy everything in notebook's pvc to pipeline's pvc for the pipeline to mount and use. also the reference suggesting ways of creating components, only lightweight components work for on premise users as other require staging_gcs_path parameter.
UI side required change here: https://github.com/kubeflow/pipelines/issues/2397
Hi all,
We are in the process of iterating upon a design doc for multi-user pipelines, a much requested feature.
After our (Arrikto) initial demo of a PoC for multi-user pipelines to the community meeting, back in October, we were asked by the pipelines team and the rest of the Kubeflow community, to describe our design and implementation. You can find it here:
Multi-User design doc with demo/slides: https://docs.google.com/document/d/18X6vKCddRARwGR8MfGHE1RkkIIDzLdZvSlx38ZAjW4U/edit?usp=sharing
We will also present this design at the next Pipelines community meeting as well, which should be on Wednesday, 11th of December 2019, at 10AM PT. Meeting Notes are here: https://docs.google.com/document/d/1cHAdK1FoGEbuQ-Rl6adBDL5W2YpDiUbnMLIwmoXBoAU
Community contribution in reviewing the two current docs will help us a lot in the merging process to end-up with the final design document.
Looking forward to your comments
@IronPan @Bobgy @gaoning777 can you link to your design doc?
Hi, all We have reviewed the multi-user design during the kubeflow pipeline community meeting on 11/27/2019.
Here is the Multi-user design doc from the Kubeflow Pipeline team: https://docs.google.com/document/d/12ikhUKAb3KhbO9AR6JUk_UX_D9pf7nFWfHAyLDv2BB0/edit?usp=sharing
/assign @chensun
@gaoning777: GitHub didn't allow me to assign the following users: chensun.
Note that only kubeflow members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
/unassign @gaoning777
/assign @chensun
Doubt: Can you please clarify the usage of Kfam? As per the arrikto reference, subjectaccessreview is planned for usage. I am not clear about the interaction between Kfam and this. Thanks
@nrchakradhar kfam is a the api service for Kubeflow multi user support. KFP team's multi user design uses it as source of truth for user authorization rules.
However, we have two different designs now. The arrikto reference is a different design of multi user support. Hopefully, we can merge both designs, but that hasn't happened.
Hey @yanniszark , thanks for your efforts, I'm very looking forward to this feature.
Currently we are using Kale
to deploy pipelines, where attached workspace pv
is being used for data passing between components. However we failed to mount the data using existing volumes as
This step is in Pending state with this message: Unschedulable: persistentvolumeclaim "workspace-notebook" not found
I assume this is because the pipeline runs in Kubeflow
namespace while my pv workspace-notebook
is a user namespace property thus it cannot be found. Enabling multi-user for pipeline will definitely help us a lot in this case. ☺️
I assume this is because the pipeline runs in Kubeflow namespace while my pv workspace-notebook is a user namespace property thus it cannot be found.
@Felihong you are correct, the Pipeline cannot find the PVC because it's not in the same namespace. Indeed, multi-user Pipelines would solve your issue.
kfam is a the api service for Kubeflow multi user support. KFP team's multi user design uses it as source of truth for user authorization rules.
@Bobgy KFAM is not expected to be an abstraction on top of Kubeflow to use for authorization. We are moving away from that practice in Kubeflow, as can be seen from the transition of the Jupyter Web App to use SubjectAccessReview. cc @jlewi
At Arrikto, we have been exploring and designing this feature since October and we are very excited to see the big user interest around this issue.
I have added an agenda item for the Pipelines community meeting on February 5th, to discuss and do a status update on the current design. We have also updated the design doc with a user journey section, as requested by @jessiezcc. https://docs.google.com/document/d/18X6vKCddRARwGR8MfGHE1RkkIIDzLdZvSlx38ZAjW4U/edit#heading=h.3ckxvbum5d4f
@yanniszark Continuing on what @Felihong said about notebooks and PVCs, since KFserving lets me segregate models into namespaces, i have the MODEL stored in a PVC which the inference server can load in the same namespace. If i have a PIPELINE then in the kubeflow namespace, i am prevented from loading that pretrained model for retraining (e.g. retrain(old_model, new_data) => new_model) Correct? And multi-user pipelines would presumably solve that as well where a "user" corresponds to the KF serving namespaces. (The alternative being to use object storage instead of pvc).
Any update on this work in kubeflow 1.0
@Mddct I'm working on preparing changes in KF 1.0 to be ready for KFP multi user support. Work will be tracked in https://github.com/kubeflow/pipelines/issues/3241
The changes to support multi-user are pretty substantial (e.g. turning on ISTIO) in the kubeflow namespace. So these should probably be targeted to a minor release (e.g. 1.1) and not be slated for a patch release.
I think we are targeting 1.1 for Q2 so June.
@jlewi while I understand the formal release can be part of 1.1 rollout, based on discussion with @gaoning777 it looked like we will have an early version of the code soon is master, which was project to be March.
@gaoning777 are we still on track vis a vis that?
Hi @animeshsingh, @gaoning777 has decided to take a new adventure, @chensun and I are actively working on multi user support now.
We are mostly on track, main functionality for backend and UI will likely on target for end of March, but tensorboard, visualization... might take longer.
The instruction for Phase 1 of the multi-user work can be found in https://drive.google.com/file/d/1aqiBrYzTJQ9dUrjOjB2OWfTBD6MKrbt6/view
This is still early stage and the API might subject to non compatible change. But please feel free to give it a try and and feedback is appreciated.
Removed from KFP 1.0 project, because this will be released separately in Kubeflow.
@Bobgy What are the remaining kustomize changes needed to make multiuser KFP available on master?
@jlewi Please take a look at https://github.com/kubeflow/pipelines/issues/3241 I have a fork at https://github.com/Bobgy/manifests/tree/kfp-multi-user-master.
@Bobgy @jlewi We are looking to have this feature before taking Kubeflow into production. I think it was labeled as high risk for 1.1, and there were not many users asking for it, let me know if we need present our case so it can make it in the release.
The implementation for us is quite large, and we'll have have about 100-200 users just for the initial implementation, without this feature it will not be possible having multiple teams sharing a production cluster. We are using Istio + Dex to remain cloud agnostic.
@maganaluis Thanks for bringing it up here!
I've finished other work items in the KF 1.1 integration list in https://github.com/kubeflow/pipelines/projects/5. Currently WIP on multi user mode for gcp + iap manifest. Getting it ready for GCP isn't very risky now.
However, istio + dex manifest is maintained by Arrikto, @yanniszark @jbottum do you have any plan you can share of supporting istio + dex with kfp multi user mode in addition to MiniKF?
I'd recommend presenting your case in Kubeflow Pipelines community meeting to let different groups get this notice.
We are deployed Kbueflow on istio. Currently though we can set multiple user namesapce, but the pipeline are shared. And data in user namespace and kubeflow is isolated. I manually mount the the 2pvs to the same path. Ideally, the user namespace should also work for pipeline separation, then the data sharing should be supported by default.
We are deployed Kbueflow on istio. Currently though we can set multiple user namesapce, but the pipeline are shared. And data in user namespace and kubeflow is isolated. I manually mount the the 2pvs to the same path. Ideally, the user namespace should also work for pipeline separation, then the data sharing should be supported by default.
Can you create separate issues for them? Separating pipelines seem a common request, we can consider adding it.
Which data do you mean? Data in minio? They are shared though.
Another voice in favor of this, we were hoping that with multi-tenancy enabled it would be possible for a user's pipeline runs to occur in their namespace: for us this is vital for auditing and billing purposes, and is more important than being able to segregate the pipelines themselves. Is this coming down the line? Is it planned for a particular release? Is there any beta etc. available?
@jackwhelpton you can take a look at
Multi user mode early access is released with doc: Instructions doc - KFP multi-user instructions for GCP: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/ this is shared with kubeflow-discuss@ google group.
This is the user instructions we shared for early access to multi user mode. Being able to let users' pipeline runs occur in their own namespace is already supported.
Current plan is to release with KF 1.1, most code changes already merged in kubeflow repo. So some of you can try it soon if interested.
@Bobgy, Thank you for reply. I just open a new issue. The data sharing is not very related to Kubeflow itself. I am using Kale extension to automate the Kubeflow pipeline compile and run. The data of the notebook server can't be passed to Kubeflow pipeline directly because the notebook server is user profile namespace while the pipeline run is in kubeflow namespace. I solved this problem by connecting two pvcs in two namespaces manually. I am thinking if the pipeline is supported separately within different user namespace, then the E2E multi-user isolation is completely and the data is shared naturally because the notebook and pipeline are both with user namespace.
Cross posting for clarification https://github.com/kubeflow/pipelines/issues/4197#issuecomment-656458724:
EDIT: described features below will be released with Kubeflow 1.1. You can use these instructions for preview on GCP. It's NOT RELEASED YET. Installation for Kubeflow 1.1 rc on GCP: https://github.com/kubeflow/gcp-blueprints/tree/v1.1-branch KFP Multi User instructions: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/edit?usp=sharing
pipeline runs are already designed to run in user namespaces. The only resource in KFP core system that is not namespace separated (as of today) is static pipeline yaml files you upload to the server. They will remain public to anyone in the cluster. Users can try to launch any pipelines in their own namespaces.
For details about which resources and which services support namespace separation, please read this early access user instruction: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/.
A quick list of things we don't support multi user separation in the upcoming KF 1.1 release:
If your organization would prefer pipeline resource separated by namespace, please upvote here. We can consider adding the support if there are enough user interest.
EDIT: enough reactions collected, the issue is tracked in https://github.com/kubeflow/pipelines/issues/4197 with priority
@Bobgy it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.
@Bobgy it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.
Yes, I agree if we decide to implement, we'll make it configurable.
Will upvote. Thanks!
On Thu, Jul 9, 2020 at 11:43 PM Yuan (Bob) Gong notifications@github.com wrote:
@Bobgy https://github.com/Bobgy it should be a feature which is enabled
- if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.
Yes, I agree if we decide to implement, we'll make it configurable.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656512584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSGHEYOKQAMX4ODRSHQ4RDR222CXANCNFSM4HIISE7Q .
[April/6/2020] Latest design is in https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit?ts=5e4d8fbb#heading=h.5s8rbufek1ax
Areas we are working on:
Release
Areas related to integration with Kubeflow
=============== original description
Some users express the interest of an isolation between the cluster admin and cluster user - Cluster admin deploy Kubeflow Pipelines as part of Kubeflow in the cluster; Cluster user can use Kubeflow Pipelines functionalities, without being able to access the control plane.
Here are the steps to support this functionality.