Closed Bobgy closed 3 years ago
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
kind/feature | 0.75 |
area/pipelines | 0.82 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
/cc @Jeffwan @animeshsingh @yanniszark Did I miss anyone?
/cc @jlewi Can you help me notify other platform owners?
Also /cc @maganaluis might be able to share his attempts deploying it with istio + dex
https://github.com/kubeflow/manifests/pull/1365 is also a blocker for other platforms
The next PR for securing multi user mode with istio mTLS is https://github.com/kubeflow/manifests/pull/1368. After this PR, KFP multi user mode is complete. I'll use a separate PR to switch GCP stack to it (already tested the manifest locally, everything works well).
/assign @Jeffwan @animeshsingh @yanniszark @jbottum FYI, KFP multi user mode verified working properly in Kubeflow 1.1 on GCP, all the work is done.
Recommend going ahead and try integrating it with your platforms, feel free to let me know if there are any blockers.
AWS test against multi-user KFP here: https://github.com/kubeflow/manifests/pull/1410
Quick Update: AWS adoption on multi-user KFP has been tested and merged.
Thanks a lot for your contributions to multi-user KFP! @Bobgy
Also /cc @maganaluis might be able to share his attempts deploying it with istio + dex
Also, AWS has tested dex and multi-user KFP and works fine.
We are going to give it a spin on IBM Cloud @shawnzhu @adrian555
I tried a kfdef https://github.com/adrian555/manifests/blob/multi-user/kfdef/kfctl_ibm_dex_multi_user.yaml (Thanks to @adrian555 ) installed on k8s v1.18.6 on IKS, this kfdef ships istio 1.1.6:
> istioctl version --remote
client version: 1.6.5
cluster-local-gateway version:
citadel version: 1.1.6
galley version: 1.1.6
pilot version: 1.1.6
pilot version: 1.1.6
pilot version: 1.1.6
pilot version: 1.1.6
policy version: 1.1.6
sidecar-injector version: 1.1.6
telemetry version: 1.1.6
And the envoyfilter add-user-filter
to simulate multi-user case via anonymous@kubeflow.org
. see https://github.com/kubeflow/manifests/blob/baddd858737ea5c7c1f6799221e97ab107213ba7/istio/add-anonymous-user-filter/base/envoy-filter.yaml#L16-L19
Kubeflow itself works (I tried notebook only).
failed pods in ns kubeflow
> kubectl get pod -n kubeflow | grep -v "Running"
NAME READY STATUS RESTARTS AGE
cache-deployer-deployment-f879495bf-5zzfj 0/1 CrashLoopBackOff 12 39m
cache-server-56554df76b-hl96h 0/1 ContainerCreating 0 3h15m
seldon-controller-manager-6d94c47fb-dqjtg 0/1 CrashLoopBackOff 38 3h15m
seldon-controller-manager
crash is expected (b/c of k8s v1.18) but the other two are not. See details below:
kubectl logs cache-deployer-deployment-f879495bf-5zzfj -n kubeflow --tail 10
certificatesigningrequest.certificates.k8s.io/cache-server.kubeflow created
+ true
+ kubectl get csr cache-server.kubeflow
NAME AGE SIGNERNAME REQUESTOR CONDITION
cache-server.kubeflow 0s kubernetes.io/legacy-unknown system:serviceaccount:kubeflow:kubeflow-pipelines-cache-deployer-sa Pending
+ '[' 0 -eq 0 ']'
+ break
+ kubectl certificate approve cache-server.kubeflow
Error from server (Forbidden): certificatesigningrequests.certificates.k8s.io "cache-server.kubeflow" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
No resources found
When I visit /pipeline
from istio-ingressgateway via port-forward, I got:
url -v http://localhost:31380/pipeline
* Trying 127.0.0.1:31380...
* Connected to localhost (127.0.0.1) port 31380 (#0)
> GET /pipeline HTTP/1.1
> Host: localhost:31380
> User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0
> Accept: */*
> Referer:
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< content-length: 91
< content-type: text/plain
< date: Sun, 26 Jul 2020 01:12:13 GMT
< server: istio-envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: connection failure
However, pipeline UI itself works when I visit from its service ml-pipeline-ui.kubeflow.svc
via port-foward.
If I delete the destinationrule
ml-pipeline-ui
OR just use DISABLE
as its TLS connection mode, everything works very well. I tried some pipelines and it provisions pipeline instances under my profile namespace 🎉 .
This is the destinationrule I mentioned:
@Bobgy I'm curious if ISTIO_MUTUAL
works for service ml-pipeline-ui
? b/c there's no Enovy sidecar for pods from deployment ml-pipeline-ui
so is such a destination rule necessary?
@shawnzhu ibm platform should enable istio sidecar injection for kubeflow namespace in kfdef refer to https://github.com/kubeflow/manifests/blob/baddd858737ea5c7c1f6799221e97ab107213ba7/namespaces/base/namespaces.yaml#L7
@Bobgy I've tested ☝️ PR where istio-injection: enabled
is included (Thanks to @adrian555 ). KFP works from UI but I got error message when using KFP SDK from a notebook in my profile namespace:
client = kfp.Client()
EXPERIMENT_NAME = 'KFServing Experiments'
experiment = client.create_experiment(name=EXPERIMENT_NAME)
I got:
ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Tue, 28 Jul 2020 16:08:15 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '0'})
HTTP response body: RBAC: access denied
So I assume the authentication is passed via mTLS since both notebook pod and ml-pipeline.kubeflow.svc
pod has sidcars.
Does the RBAC access denied means it requires more rules for cluster role kubeflow-edit
? which hasn’t been changed since Oct 2019?
If you want to chat, I started it in this slack message: https://kubeflow.slack.com/archives/CE10KS9M4/p1595953494216900
after reading through the user journey, I added role and rolebing to the service account default-editor
as instructed:
$ kubectl auth can-i --list | grep pipeline
pipelines.pipelines.kubeflow.org [] [] [create get list]
runs.pipelines.kubeflow.org [] [] [create get list]
but I still get ☝️ error RBAC: access denied
AWS issues #1430 @Bobgy I didn't follow all the implementation details. Any reason to inject kubeflow namespace and add DestinationRules
? I tried to remove destinationRules and UI page works fine. However, I think some multi-user feature might be blocked or restrict due to this.
If there's a checklist to see if multi-user works under expectation under different platforms? Like you should see virtualizationserver
and ui-artifact
under user's namespace, etc? Or any sanity check?
Another issue I notice is ml-pipeline-ui-artifact
doesn't use right header. It by default use GCP header, however, it's a little bit hard to override because configs come from kubeflow-config
which is in kubeflow
namespace
And it seems it's causing other unintended side-effects with KFServing, look at the last comments there. https://github.com/kubeflow/kfserving/issues/568#issuecomment-664735067
@Jeffwan did you test KFServing at your end within this context?
And it seems it's causing other unintended side-effects with KFServing, look at the last comments there. kubeflow/kfserving#568 (comment)
@Jeffwan did you test KFServing at your end within this context?
We have not tested KFServing with multi-user pipeline yet. Enable injection in kubeflow namespace is kind of breaking change, is the KFServing issue can be resolved under this setting?
So I assume the authentication is passed via mTLS since both notebook pod and
ml-pipeline.kubeflow.svc
pod has sidcars. Does the RBAC access denied means it requires more rules for cluster rolekubeflow-edit
? which hasn’t been changed since Oct 2019?If you want to chat, I started it in this slack message: https://kubeflow.slack.com/archives/CE10KS9M4/p1595953494216900
@shawnzhu em. In this case, for users using KFP client outside the cluster, does mTLS
still work?
In this case, for users using KFP client outside the company, does mTLS still work?
The mTLS will happen between istio-ingressgateway.istio-system.svc
and ml-pipeline.kubeflow.svc
. it depends on the istio-ingressgateway
config that whether secure connection from KFP client. it should be just TLS w/o verifying client (e.g., HTTPS).
Answered some questions in https://github.com/kubeflow/manifests/issues/1430#issuecomment-665597274. For context of enabling injection in kubeflow namespace: https://github.com/kubeflow/kubeflow/issues/3866. Things I did to make sure this is not a breaking change: I sent PRs like https://github.com/kubeflow/manifests/pull/712 that added the istio injection false pod annotation to every pod in kubeflow namespace. Also, per https://github.com/kubeflow/kubeflow/issues/3866#issuecomment-600399037
So there shouldn't be dynamic services (e.g. TFJobs) running in the kubeflow namespace.
Therefore, enabling istio sidecar injection shouldn't break other services that do not create dynamic workloads in kubeflow namespace.
The current configurations assume kubeflow namespace has already enabled istio sidecar injection, if you didn't enable it. DestinationRule will break connections from clients with istio sidecar or istio gateway -- that's why you are seeing the error messages.
Per https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit#heading=h.b3vxor3gcdvs, not supporting in-cluster direct access to KFP ui/api server is a known limitation of current phase of KFP multi user support.
Workaround was mentioned in the issue
KFP api server will be protected by istio authz, only requests with user identity that can pass the public endpoint auth have a chance getting accepted (the mechanism is similar to notebook servers). Therefore, KFP sdk client needs to be able to authenticate for other platforms' auth service. I know dex is one thing we need to support, what are other platforms using? (Issue for dex is kubeflow/kubeflow#4912, there has been some progress.)
We did this, because if we allow other in-cluster requests, these requests can fake any user header bypassing all authentication checks.
@yanniszark has some design that provides an alternative for in-cluster authentication. Probably he can introduce that, but I'm not sure when that feature can be merged to KFP.
@shawnzhu
after reading through the user journey, I added role and rolebing to the service account default-editor as instructed:
$ kubectl auth can-i --list | grep pipeline pipelines.pipelines.kubeflow.org [] [] [create get list] runs.pipelines.kubeflow.org [] [] [create get list] but I still get ☝️ error RBAC: access denied
I think you have read design doc for features KFP will implement in the next phase. also /cc @yanniszark who will work on that.
In current phase, note that RBAC: access denied
error message comes from istio RBAC rejection, not Kubernetes RBAC.
So you should be able to set up permissions following https://istio.io/latest/zh/docs/reference/config/security/istio.rbac.v1alpha1/, using ServiceRole
and ServiceRoleBinding
. This will allow istio RBAC access, but KFP api server will still reject these requests because they don't have a KUBEFLOW_USERID_HEADER.
So with the current phase, the only way to access KFP API server securely, is to connect through the public endpoint and make sure KUBEFLOW_USERID_HEADER exists in the request through istio gateway
Can you remind me where kfserving is deployed? I think Jeremy has moved kfserving to its own namespace in GCP platform. What is the case for others?
Per https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit#heading=h.b3vxor3gcdvs, not supporting in-cluster direct access to KFP ui/api server is a known limitation of current phase of KFP multi user support.
em. Good to know this is not supported now.. We check design docs here and may misunderstand scope at that time, The graph actually shows cli/UI/sdk traffic have to go through the istio ingress gateway. Thanks for clarification.
We did this, because if we allow other in-cluster requests, these requests can fake any user header bypassing all authentication checks.
Technically, once user get authenticated, from pipeline's perspective, it's should be safe to use user identity, kubeflow-userid
or header
can be faked. I looked at the design and see @yanniszark has some proposal to use serviceAccountToken
I think that should be nice to have. Otherwise, it's kind of weird traffic go to external and come inside again like notebook (in mesh) -> public endpoint (go to gateway) -> pipeline ( in mesh).
I proposed a new feature to KFP SDK by guessing user namespace in https://github.com/kubeflow/pipelines/pull/4293 so that it can reduce the amount of code changes for all existing KFP samples.
KF notebook sdk issue when using hosts like `istio-ingressgateway.istio-system.svc.cluster.local https://github.com/kubeflow/pipelines/issues/4297 - let's use this issue as master to compile related issues/PRs @shawnzhu
Related PR - Pass userid-header param to kfam https://github.com/kubeflow/manifests/pull/1445 thanks @yhwang
@Bobgy AWS platform doesn't have any issues with multi-user pipeline. We should be good to go. I will send some PRs to pipeline SDK for easy authentication later for AWS. This is not blockers for us in 1.1
@Bobgy IBM Cloud should be good to go as well.
Docs will be updated later. Additionally, SDK is another area where we may enable an easier authentication with IBM Cloud Identity and Acces Management later.
cc @shawnzhu @adrian555 @jlewi
Awesome! Thanks for your hard work integrating pipelines multi user support! Let's close this issue if no further issues.
/reopen
Keep this open as information for integrating with remaining platforms in https://www.kubeflow.org/docs/started/getting-started/#configuration-quick-reference.
@jlewi do you know who are the stakeholders for these platforms?
@Bobgy: Reopened this issue.
@krishnadurai and @swiftdiaries have integrated KFP multi-user mode in k8s_istio and istio_dex installations: https://github.com/kubeflow/manifests/pull/1484 and https://github.com/kubeflow/manifests/pull/1494.
Openshift is the only one left.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.
Seems like OpenShift is still left behind
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.
This issue has been closed due to inactivity.
What is expected timeline for OpenShift?
I've merged the first PR for enabling KFP multi user mode.
Platform owners can include this generic module https://github.com/kubeflow/manifests/tree/master/pipeline/installs/multi-user, instead of
pipeline/installs/generic
to test KFP multi user mode right now.Additional change per platform to integrate with KFP multi user mode:
If your platform cannot integrate well with KFP multi user mode before KF 1.1 release, backup options: