kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[Multi user] How do we release KFP multi user in Kubeflow? #3645

Closed Bobgy closed 4 years ago

Bobgy commented 4 years ago

Part of https://github.com/kubeflow/pipelines/issues/1223

/cc @jlewi @chensun @IronPan

Options:

Note, it's only for gcp-iap deployment initially.

Bobgy commented 4 years ago

@jlewi I think we want a way to release it soon (maybe as experimental first).

So we can also go with release as an experimental fork from kubeflow 1.0.2 and wait for either 1.0.3 or 1.1.

What do you think?

jlewi commented 4 years ago

@Bobgy Why do we need a fork? If its on master can't people just deploy from master?

jlewi commented 4 years ago

@Bobgy is all the relevant code/fixes on master?

What is the path forward for turning on ISTIO in the kubeflow namespace? kubeflow/kfctl#296

Does the KFP manifest(https://github.com/kubeflow/manifests/tree/master/pipeline) need to be updated with multiuser changes?

Bobgy commented 4 years ago

@jlewi I verified using namespace resource works (I guess I misconfigured sth when I tried it before), so we can just use kfctl 1.0.2.

All relevant code/fixes are on KFP master and my forked manifest branch: https://github.com/Bobgy/manifests/pull/3/files.

(it can be deployed normally following https://www.kubeflow.org/docs/gke/deploy/deploy-cli/ and use CONFIG_URI="https://raw.githubusercontent.com/Bobgy/manifests/kfp-multi-user/kfdef/kfctl_gcp_iap.yaml" instead

Bobgy commented 4 years ago

I prefer that, in this release, only KFP is kind of experimental using the multi user mode, but all other components are stable. Therefore, I think KF 1.0.2 might be a good candidate as a base.

What do you think?

in the mean time, I can try to merge forked changes to Kubeflow master

jlewi commented 4 years ago

@Bobgy my suggestion would be to get things working on master. I don't know that we want to big changes onto release branches. Either way though getting it checked in working on master is a precursor so lets do that first.

Bobgy commented 4 years ago

I think we'd want to release a relatively stable version soon for community to try out KFP multi user mode in a forked branch.

In the mean time, I can try to start sending PRs to master.

jlewi commented 4 years ago

Would back-porting this onto 1.0 branch be consistent with semantic versioning? If we backport this onto 1.0 do we risk blocking our ability to release minor patch fixes to 1.0?

Bobgy commented 4 years ago

I see, that makes sense to me.

Shall we

jlewi commented 4 years ago

My suggestion would be to get it working on master. Once we have it working on master we can either add a git tag or possibly create a new release branch. I'd probably recommend we bump the minor version; e.g 1.1.0-alpha1.

Bobgy commented 4 years ago

@jlewi Sounds good.

After rethinking over it, I agree there's not much value added if the early access fork is based on 1.0.2, because KFP itself is also experimental.

I'll try to merge to kubeflow master first

jlewi commented 4 years ago

@Bobgy +1 It would be good to get the multiuser support integrated into the GCP blueprint for blueprint users. That should be fairly straightforward. Blueprints should be easier to upgrade so it will be easier to roll out changes to customers using the blueprints.

I think the only changes would be

  1. Add pipelines to the stacks kustomization file

    • Which is related to kubeflow/manifests#1048 but not necessarily blocked on it.
  2. Replace the namespaces package in the blueprint (this [file])(https://github.com/kubeflow/gcp-blueprints/blob/master/kubeflow/instance/kustomize/namespaces/namespaces.yaml) with the one you added to kubeflow/manifests so the blueprint is in sync

Per: kubeflow/gcp-blueprints#5 I'm setting up autodeployments for gcp blueprints. That should make it easy for us to verify and ultimately add tests to verify it is all working.

Bobgy commented 4 years ago

@jlewi Thanks for the suggestion! Because I have limited capacity working on these, I will get these changes merged to kubeflow/manifest master first before trying out the gcp blue print.

yanniszark commented 4 years ago

@Bobgy thanks a lot for confirming. It seems that the user-identity parsing code is already present in a few places and probably more in the future.

While we (and other users) can test multi-user pipelines on their on-prem installations, it's very difficult to dig through every part of the code and make sure the headers are configurable. The best time to incorporate these options is when the code is first written.

Is there a plan to make the headers configurable like the rest of Kubeflow's web apps? (Jupyter Web App, CentralDashboard, etc). That would allow non-GCP users of Kubeflow to use multi-user pipelines.

@jlewi we had similar issues with web apps being GCP-only in v0.6 and we contributed changes throughout the code after the fact. After v0.6, we had agreed to use the established headers, so we wouldn't have to do it again. How can we make sure that new code is written in a way that it doesn't have an artificial dependence on GCP?

jlewi commented 4 years ago

we had agreed to use the established headers Do you mean we agreed to use environment variables to make the headers programmable? I recall discussing 2 options

  1. Using ISTIO to transform the headers to some standard values
  2. Making the headers programmatic for each application.

I thought we mostly went with the second option?

@yanniszark I think the best way would be to come with ways to make it easy for application developers to follow best practices. Ideally, that would mean finding some way to handle it centrally so people don't have to think about it or creating reusable libraries.

Tests would also help. If application developers write tests for the feature (e.g. multiuser pipelines) and we have CI setup for different platforms then that test would fail on other platforms and provide appropriate signal.

Finally documenting it and promoting it is also helpful.

Bobgy commented 4 years ago

Multi user is a very complex feature, I don't feel there's any problem if we prototype with just one platform first. It was good enough we agreed on the high-level design in the beginning and guaranteed it should work with all platforms. Now, it's pretty trivial to port the feature.

There are only been two places we are using it: https://github.com/kubeflow/pipelines/search?q=x-goog-authenticated-user-email&unscoped_q=x-goog-authenticated-user-email

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

yanniszark commented 4 years ago

Do you mean we agreed to use environment variables to make the headers programmable?

@jlewi Yes, exactly, that was bad phrasing on my part. Since in v0.6 we agreed to make them configurable (through envvars, cli args or config files), I would expect new applications integrating with Kubeflow's multi-tenancy to follow that approach.

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

Absolutely! Here are some examples:

@Bobgy I agree that expectations of Web Apps in terms of auth should be clear. To help with making the concepts more clear and streamline implementations of future components, I have started a design doc that gathers all the current knowledge that is spread through the code in one place: https://docs.google.com/document/d/11Xi-I2OqJvUuy_Zg0NskMF9GmworRDJhlu68poDgK5c/edit#bookmark=id.d3wqgeictsp4 I expect we will also discuss it further in Tuesday's Kubeflow community meeting.

The KUBEFLOW_USERID_HEADER and KUBEFLOW_USERID_PREFIX options would be a great start to get Pipelines working in all platforms.

Bobgy commented 4 years ago

/close As we have reached agreement, further progress will be tracked in https://github.com/kubeflow/pipelines/issues/3693

k8s-ci-robot commented 4 years ago

@Bobgy: Closing this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/3645#issuecomment-627780711): >/close >As we have reached agreement, further progress will be tracked in https://github.com/kubeflow/pipelines/issues/3693 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.