kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
810 stars 874 forks source link

[Pipelines Multi User] integrate with non GCP platforms #1364

Closed Bobgy closed 3 years ago

Bobgy commented 4 years ago

I've merged the first PR for enabling KFP multi user mode.

Platform owners can include this generic module https://github.com/kubeflow/manifests/tree/master/pipeline/installs/multi-user, instead of pipeline/installs/generic to test KFP multi user mode right now.

Additional change per platform to integrate with KFP multi user mode:

If your platform cannot integrate well with KFP multi user mode before KF 1.1 release, backup options:

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.75
area/pipelines 0.82

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

Bobgy commented 4 years ago

/cc @Jeffwan @animeshsingh @yanniszark Did I miss anyone?

/cc @jlewi Can you help me notify other platform owners?

Bobgy commented 4 years ago

Also /cc @maganaluis might be able to share his attempts deploying it with istio + dex

Bobgy commented 4 years ago

https://github.com/kubeflow/manifests/pull/1365 is also a blocker for other platforms

Bobgy commented 4 years ago

The next PR for securing multi user mode with istio mTLS is https://github.com/kubeflow/manifests/pull/1368. After this PR, KFP multi user mode is complete. I'll use a separate PR to switch GCP stack to it (already tested the manifest locally, everything works well).

Bobgy commented 4 years ago

/assign @Jeffwan @animeshsingh @yanniszark @jbottum FYI, KFP multi user mode verified working properly in Kubeflow 1.1 on GCP, all the work is done.

Recommend going ahead and try integrating it with your platforms, feel free to let me know if there are any blockers.

PatrickXYS commented 4 years ago

AWS test against multi-user KFP here: https://github.com/kubeflow/manifests/pull/1410

Quick Update: AWS adoption on multi-user KFP has been tested and merged.

Thanks a lot for your contributions to multi-user KFP! @Bobgy

PatrickXYS commented 4 years ago

Also /cc @maganaluis might be able to share his attempts deploying it with istio + dex

Also, AWS has tested dex and multi-user KFP and works fine.

animeshsingh commented 4 years ago

We are going to give it a spin on IBM Cloud @shawnzhu @adrian555

shawnzhu commented 4 years ago

I tried a kfdef https://github.com/adrian555/manifests/blob/multi-user/kfdef/kfctl_ibm_dex_multi_user.yaml (Thanks to @adrian555 ) installed on k8s v1.18.6 on IKS, this kfdef ships istio 1.1.6:

> istioctl version --remote
client version: 1.6.5
cluster-local-gateway version: 
citadel version: 1.1.6
galley version: 1.1.6
pilot version: 1.1.6
pilot version: 1.1.6
pilot version: 1.1.6
pilot version: 1.1.6
policy version: 1.1.6
sidecar-injector version: 1.1.6
telemetry version: 1.1.6

And the envoyfilter add-user-filter to simulate multi-user case via anonymous@kubeflow.org. see https://github.com/kubeflow/manifests/blob/baddd858737ea5c7c1f6799221e97ab107213ba7/istio/add-anonymous-user-filter/base/envoy-filter.yaml#L16-L19

Kubeflow itself works (I tried notebook only).

What doesn't work

failed pods in ns kubeflow

> kubectl get pod -n kubeflow | grep -v "Running"
NAME                                                    READY   STATUS              RESTARTS   AGE
cache-deployer-deployment-f879495bf-5zzfj               0/1     CrashLoopBackOff    12         39m
cache-server-56554df76b-hl96h                           0/1     ContainerCreating   0          3h15m
seldon-controller-manager-6d94c47fb-dqjtg               0/1     CrashLoopBackOff    38         3h15m

seldon-controller-manager crash is expected (b/c of k8s v1.18) but the other two are not. See details below:

kubectl logs cache-deployer-deployment-f879495bf-5zzfj -n kubeflow --tail 10
certificatesigningrequest.certificates.k8s.io/cache-server.kubeflow created
+ true
+ kubectl get csr cache-server.kubeflow
NAME                    AGE   SIGNERNAME                     REQUESTOR                                                             CONDITION
cache-server.kubeflow   0s    kubernetes.io/legacy-unknown   system:serviceaccount:kubeflow:kubeflow-pipelines-cache-deployer-sa   Pending
+ '[' 0 -eq 0 ']'
+ break
+ kubectl certificate approve cache-server.kubeflow
Error from server (Forbidden): certificatesigningrequests.certificates.k8s.io "cache-server.kubeflow" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
No resources found

When I visit /pipeline from istio-ingressgateway via port-forward, I got:

url -v http://localhost:31380/pipeline
*   Trying 127.0.0.1:31380...
* Connected to localhost (127.0.0.1) port 31380 (#0)
> GET /pipeline HTTP/1.1
> Host: localhost:31380
> User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0
> Accept: */*
> Referer: 
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< content-length: 91
< content-type: text/plain
< date: Sun, 26 Jul 2020 01:12:13 GMT
< server: istio-envoy
< 
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: connection failure

However, pipeline UI itself works when I visit from its service ml-pipeline-ui.kubeflow.svc via port-foward.

Problem

If I delete the destinationrule ml-pipeline-ui OR just use DISABLE as its TLS connection mode, everything works very well. I tried some pipelines and it provisions pipeline instances under my profile namespace 🎉 .

This is the destinationrule I mentioned:

https://github.com/kubeflow/manifests/blob/baddd858737ea5c7c1f6799221e97ab107213ba7/pipeline/installs/multi-user/istio-authorization-config.yaml#L60-L68

@Bobgy I'm curious if ISTIO_MUTUAL works for service ml-pipeline-ui? b/c there's no Enovy sidecar for pods from deployment ml-pipeline-ui so is such a destination rule necessary?

Bobgy commented 4 years ago

@shawnzhu ibm platform should enable istio sidecar injection for kubeflow namespace in kfdef refer to https://github.com/kubeflow/manifests/blob/baddd858737ea5c7c1f6799221e97ab107213ba7/namespaces/base/namespaces.yaml#L7

shawnzhu commented 4 years ago

@Bobgy I've tested ☝️ PR where istio-injection: enabled is included (Thanks to @adrian555 ). KFP works from UI but I got error message when using KFP SDK from a notebook in my profile namespace:

client = kfp.Client()
EXPERIMENT_NAME = 'KFServing Experiments'
experiment = client.create_experiment(name=EXPERIMENT_NAME)

I got:

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Tue, 28 Jul 2020 16:08:15 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '0'})
HTTP response body: RBAC: access denied

So I assume the authentication is passed via mTLS since both notebook pod and ml-pipeline.kubeflow.svc pod has sidcars. Does the RBAC access denied means it requires more rules for cluster role kubeflow-edit ? which hasn’t been changed since Oct 2019?

If you want to chat, I started it in this slack message: https://kubeflow.slack.com/archives/CE10KS9M4/p1595953494216900

Updates

after reading through the user journey, I added role and rolebing to the service account default-editor as instructed:

$ kubectl auth can-i --list | grep pipeline
pipelines.pipelines.kubeflow.org                []                  []                     [create get list]
runs.pipelines.kubeflow.org                     []                  []                     [create get list]

but I still get ☝️ error RBAC: access denied

Jeffwan commented 4 years ago

AWS issues #1430 @Bobgy I didn't follow all the implementation details. Any reason to inject kubeflow namespace and add DestinationRules? I tried to remove destinationRules and UI page works fine. However, I think some multi-user feature might be blocked or restrict due to this.

If there's a checklist to see if multi-user works under expectation under different platforms? Like you should see virtualizationserver and ui-artifact under user's namespace, etc? Or any sanity check?

Jeffwan commented 4 years ago

Another issue I notice is ml-pipeline-ui-artifact doesn't use right header. It by default use GCP header, however, it's a little bit hard to override because configs come from kubeflow-config which is in kubeflow namespace

animeshsingh commented 4 years ago

And it seems it's causing other unintended side-effects with KFServing, look at the last comments there. https://github.com/kubeflow/kfserving/issues/568#issuecomment-664735067

@Jeffwan did you test KFServing at your end within this context?

Jeffwan commented 4 years ago

And it seems it's causing other unintended side-effects with KFServing, look at the last comments there. kubeflow/kfserving#568 (comment)

@Jeffwan did you test KFServing at your end within this context?

We have not tested KFServing with multi-user pipeline yet. Enable injection in kubeflow namespace is kind of breaking change, is the KFServing issue can be resolved under this setting?

Jeffwan commented 4 years ago

So I assume the authentication is passed via mTLS since both notebook pod and ml-pipeline.kubeflow.svc pod has sidcars. Does the RBAC access denied means it requires more rules for cluster role kubeflow-edit ? which hasn’t been changed since Oct 2019?

If you want to chat, I started it in this slack message: https://kubeflow.slack.com/archives/CE10KS9M4/p1595953494216900

@shawnzhu em. In this case, for users using KFP client outside the cluster, does mTLS still work?

shawnzhu commented 4 years ago

In this case, for users using KFP client outside the company, does mTLS still work?

The mTLS will happen between istio-ingressgateway.istio-system.svc and ml-pipeline.kubeflow.svc. it depends on the istio-ingressgateway config that whether secure connection from KFP client. it should be just TLS w/o verifying client (e.g., HTTPS).

Bobgy commented 4 years ago

Answered some questions in https://github.com/kubeflow/manifests/issues/1430#issuecomment-665597274. For context of enabling injection in kubeflow namespace: https://github.com/kubeflow/kubeflow/issues/3866. Things I did to make sure this is not a breaking change: I sent PRs like https://github.com/kubeflow/manifests/pull/712 that added the istio injection false pod annotation to every pod in kubeflow namespace. Also, per https://github.com/kubeflow/kubeflow/issues/3866#issuecomment-600399037

So there shouldn't be dynamic services (e.g. TFJobs) running in the kubeflow namespace.

Therefore, enabling istio sidecar injection shouldn't break other services that do not create dynamic workloads in kubeflow namespace.

The current configurations assume kubeflow namespace has already enabled istio sidecar injection, if you didn't enable it. DestinationRule will break connections from clients with istio sidecar or istio gateway -- that's why you are seeing the error messages.

Per https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit#heading=h.b3vxor3gcdvs, not supporting in-cluster direct access to KFP ui/api server is a known limitation of current phase of KFP multi user support.

Workaround was mentioned in the issue

KFP api server will be protected by istio authz, only requests with user identity that can pass the public endpoint auth have a chance getting accepted (the mechanism is similar to notebook servers). Therefore, KFP sdk client needs to be able to authenticate for other platforms' auth service. I know dex is one thing we need to support, what are other platforms using? (Issue for dex is kubeflow/kubeflow#4912, there has been some progress.)

We did this, because if we allow other in-cluster requests, these requests can fake any user header bypassing all authentication checks.

@yanniszark has some design that provides an alternative for in-cluster authentication. Probably he can introduce that, but I'm not sure when that feature can be merged to KFP.

Bobgy commented 4 years ago

@shawnzhu

after reading through the user journey, I added role and rolebing to the service account default-editor as instructed:

$ kubectl auth can-i --list | grep pipeline pipelines.pipelines.kubeflow.org [] [] [create get list] runs.pipelines.kubeflow.org [] [] [create get list] but I still get ☝️ error RBAC: access denied

I think you have read design doc for features KFP will implement in the next phase. also /cc @yanniszark who will work on that.

In current phase, note that RBAC: access denied error message comes from istio RBAC rejection, not Kubernetes RBAC. So you should be able to set up permissions following https://istio.io/latest/zh/docs/reference/config/security/istio.rbac.v1alpha1/, using ServiceRole and ServiceRoleBinding. This will allow istio RBAC access, but KFP api server will still reject these requests because they don't have a KUBEFLOW_USERID_HEADER.

So with the current phase, the only way to access KFP API server securely, is to connect through the public endpoint and make sure KUBEFLOW_USERID_HEADER exists in the request through istio gateway

Bobgy commented 4 years ago

Can you remind me where kfserving is deployed? I think Jeremy has moved kfserving to its own namespace in GCP platform. What is the case for others?

Jeffwan commented 4 years ago

Per https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit#heading=h.b3vxor3gcdvs, not supporting in-cluster direct access to KFP ui/api server is a known limitation of current phase of KFP multi user support.

em. Good to know this is not supported now.. We check design docs here and may misunderstand scope at that time, The graph actually shows cli/UI/sdk traffic have to go through the istio ingress gateway. Thanks for clarification.

We did this, because if we allow other in-cluster requests, these requests can fake any user header bypassing all authentication checks.

Technically, once user get authenticated, from pipeline's perspective, it's should be safe to use user identity, kubeflow-userid or header can be faked. I looked at the design and see @yanniszark has some proposal to use serviceAccountToken I think that should be nice to have. Otherwise, it's kind of weird traffic go to external and come inside again like notebook (in mesh) -> public endpoint (go to gateway) -> pipeline ( in mesh).

shawnzhu commented 4 years ago

I proposed a new feature to KFP SDK by guessing user namespace in https://github.com/kubeflow/pipelines/pull/4293 so that it can reduce the amount of code changes for all existing KFP samples.

animeshsingh commented 4 years ago

KF notebook sdk issue when using hosts like `istio-ingressgateway.istio-system.svc.cluster.local https://github.com/kubeflow/pipelines/issues/4297 - let's use this issue as master to compile related issues/PRs @shawnzhu

animeshsingh commented 4 years ago

Related PR - Pass userid-header param to kfam https://github.com/kubeflow/manifests/pull/1445 thanks @yhwang

Jeffwan commented 4 years ago

@Bobgy AWS platform doesn't have any issues with multi-user pipeline. We should be good to go. I will send some PRs to pipeline SDK for easy authentication later for AWS. This is not blockers for us in 1.1

animeshsingh commented 4 years ago

@Bobgy IBM Cloud should be good to go as well.

Docs will be updated later. Additionally, SDK is another area where we may enable an easier authentication with IBM Cloud Identity and Acces Management later.

cc @shawnzhu @adrian555 @jlewi

Bobgy commented 4 years ago

Awesome! Thanks for your hard work integrating pipelines multi user support! Let's close this issue if no further issues.

Bobgy commented 4 years ago

/reopen

Keep this open as information for integrating with remaining platforms in https://www.kubeflow.org/docs/started/getting-started/#configuration-quick-reference.

@jlewi do you know who are the stakeholders for these platforms?

k8s-ci-robot commented 4 years ago

@Bobgy: Reopened this issue.

In response to [this](https://github.com/kubeflow/manifests/issues/1364#issuecomment-668415871): >/reopen > >Keep this open as information for integrating with remaining platforms in https://www.kubeflow.org/docs/started/getting-started/#configuration-quick-reference. > >* [ ] kfctl_k8s_istio >* [ ] kfctl_istio_dex >* [ ] openshift > >@jlewi do you know who are the stakeholders for these platforms? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Bobgy commented 4 years ago

@krishnadurai and @swiftdiaries have integrated KFP multi-user mode in k8s_istio and istio_dex installations: https://github.com/kubeflow/manifests/pull/1484 and https://github.com/kubeflow/manifests/pull/1494.

Openshift is the only one left.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

PatrickXYS commented 3 years ago

Seems like OpenShift is still left behind

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been closed due to inactivity.

nanduri1998 commented 3 years ago

What is expected timeline for OpenShift?