kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.55k stars 1.6k forks source link

[Multi User] failed to call `kfp.Client().create_run_from_pipeline_func` in in-cluster juypter notebook #4440

Closed yhwang closed 3 years ago

yhwang commented 4 years ago

What steps did you take:

In a multi-user enabled env, I created a notebook server on user's namespace, launch a notebook and try to call Python SDK from there. When I execute the code below:

pipeline = kfp.Client().create_run_from_pipeline_func(mnist_pipeline, arguments={}, namespace='mynamespace')

What happened:

The API call was rejected with the following errors:

~/.local/lib/python3.6/site-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    236 
    237         if not 200 <= r.status <= 299:
--> 238             raise ApiException(http_resp=r)
    239 
    240         return r

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Tue, 01 Sep 2020 00:58:39 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '8'})
HTTP response body: RBAC: access denied

What did you expect to happen:

A pipeline run should be created and executed

Environment:

How did you deploy Kubeflow Pipelines (KFP)?

I installed the KFP on IKS with multi-user support KFP version: v1.1.0 KFP SDK version: v1.0.0

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

yhwang commented 4 years ago

Since the notebook server uses serviceaccount: default-editor in my user's namespace, I can fixed the RBAC issue by adding a servicerolebinding to allow the serviceaccount to access ml-pipeline-service. However, the request is still rejected by the ml-pipelie-api-server:

~/.local/lib/python3.6/site-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    236 
    237         if not 200 <= r.status <= 299:
--> 238             raise ApiException(http_resp=r)
    239 
    240         return r

ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Tue, 01 Sep 2020 01:07:38 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","message":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Request header error: there is no user identity header.","error_details":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header."}]}
Bobgy commented 4 years ago

This is by design. In current phase, KFP api server needs a trusted source for user identity.

and letting user specify userid header is not trustable.

We introduced the design with @yanniszark in https://kccnceu20.sched.com/event/Zeok/enabling-multi-user-machine-learning-workflows-for-kubeflow-pipelines-yannis-zarkadas-arrikto-yuan-gong-google

You may check the slides there and our public design doc. There is an idea to use service account token to authenticate. @yanniszark is commited to implement that.

Bobgy commented 4 years ago

current suggested workaround is to always authenticate from the public endpoint using user credentials

Ark-kun commented 4 years ago

@Bobgy What would be the steps needed to make programmatic call to client methods? I think in the past kfp.Client() worked in in-cluster notebook scenarios...

yhwang commented 4 years ago

@Bobgy thanks for the link. It's really helpful. In the slides, the idea is to use RBAC to store the access policies. I guess it should be k8s RBAC (because of SubjectAccessReview). How about using istio RBAC instead? Because the goal is to protect the pipeline server API endpoints and actually istio can do that by setting up proper istio RBAC for those endpoints. KFAM just needs to maintain correct istio RBAC objects. It ties to istio though.

Bobgy commented 4 years ago

@yhwang to be specific, the goal is to protect pipeline resources based on user identity.

It's also possible to use istio RBAC to define rules that uses http request path and parameters.

I think problems are

I don't have full context whether the option was initially considered or not. @IronPan @gaoning777 @yanniszark do you have context on this?

yhwang commented 4 years ago

I just want to explore possible approaches. k8s RBAC and istio RBAC are designed to handle different level authorization. One is for k8s resource, another is for application level endpoints. I just tried to figure out which one is better for our purpose.

Another point is that if KFAM is doing the validation, ml-pipeline-api-server has to call its API to validate the coming requests. Then all requests go to ml-pipeline-api-server will be examined. The validation can't be done before ml-pipeline-api-server, otherwise some paths go to ml-pipeline-api-server may not be inspected.

About the troubleshoot of the RBAC, envoy sidecar allows you to turn on the debug level of RBAC and you can see detailed information about how it performs the RBAC validation as well as requests' metadata, i.e. request.auth.principal, source.principal and etc.

Bobgy commented 4 years ago

Agree, to clarify, I think istio is relatively harder to use because the abstraction of istio RBAC needs so much knowledge from users to use properly. While k8s rbac is very clear: just resource, namespace, verb, user, there isn't any redundant concepts here.

meowcakes commented 4 years ago

current suggested workaround is to always authenticate from the public endpoint using user credentials

How does this work for non-GCP clusters? I saw this issue where it was stated that auth is only possible using GCP IAP, and that someone using AWS should use the kfp client from within the cluster.

yhwang commented 4 years ago

@meowcakes

that someone using AWS should use the kfp client from within the cluster.

It doesn't work now at least in my case (multi-user enabled env, kfp v1.1.0). The reason is there are two user identity checkings in ml-pipeline:

First one could be solved by adding some istio RBAC configs. However, the second one is related to design and under implementation based on @Bobgy comment above

Bobgy commented 4 years ago

@meowcakes the issue you are Looking at is now outdated.

/assign @PatrickXYS for How to connect on AWS

PatrickXYS commented 4 years ago

@yhwang For KFP 1.1, in-cluster communitation from notebook to Kubeflow Pipeline is not supported in this phase.

In order to use kfp as before, you need to pass a cookie to KFP for communication as a workaround.

Ref: https://www.kubeflow.org/docs/aws/pipeline/#authenticate-kubeflow-pipeline-using-sdk-inside-cluster

jonasdebeukelaer commented 4 years ago

current suggested workaround is to always authenticate from the public endpoint using user credentials

@Bobgy what does this mean exactly? How would I authenticate to the public endpoint?

Bobgy commented 4 years ago

@jonasdebeukelaer Here's documentation for GCP https://www.kubeflow.org/docs/gke/pipelines/authentication-sdk/#connecting-to-kubeflow-pipelines-in-a-full-kubeflow-deployment

yhwang commented 4 years ago

@Bobgy I thought about this issue again and I think this is where istio envoy filter could be used without any application change. I added an envoy filter to add kubeflow-userid header for those HTTP traffics going to ml-pipeline.kubeflow.svc.cluster.local:8888. Then it works. So you actually don't need to do the authentication trick for in-cluster use case. It's weird to me to perform authentication for in-cluster scenario. The kubeflow-userid I injected in the http header is the namespace owner's userid. I think it totally make sense.

In conclusion, I added two config objects to make it work:

If these two configs could be created with the notebook server, then it will be perfect!

Bobgy commented 4 years ago

@yhwang interesting, can you share an example of these configs?

yhwang commented 4 years ago

@Bobgy sure!

The envoy filter above only inject the kubeflow-userid HTTP header for those traffic going to ml-pipelie service

yhwang commented 4 years ago

@Bobgy

I studied the envoy filter more and here is a better version:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: mynamespace
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: user@example.com
  workloadSelector:
    labels:
      notebook-name: mynotebook

It directly uses the custom request header feature that http_connection_manager provides. Because the header name/value are fixed, no need to use lua filter.

arshashi commented 3 years ago

@yhwang Thanks for the suggestion. I tried adding ServiceRoleBinding, after which as you mentioned i get the below error. After the error I tried adding envoy filter, however the error remains same. Providing the envoy filter yaml file for reference. Namespace: brainyapps header Value: brainyapps@example.com and notebook name: mynotebook

Does the below envoy-filter to be added in our namespace (brainyapps) or in the namespace (kubeflow). For your information: default and system namespaces are in restricted pod security policy

Reason: Conflict HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Tue, 08 Sep 2020 12:33:51 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'transfer-encoding': 'chunked'}) HTTP response body: {"error":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","message":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Request header error: there is no user identity header.","error_details":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header."}]}

apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: add-header namespace: brainyapps spec: configPatches:

yhwang commented 3 years ago

@arshashi The envoyfilter should be added to user's namespace. You may check your notebook server pod and see if its label does have the notebook-name: mynotebook. That will make sure the envoyfilter would apply to the notebook server in user's namespace. Also check the user's namespace, i.e.

kubectl get ns brainyapps -o yaml

and make sure the owner is brainyapps@example.com. for example:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    owner: brainyapps@example.com
    ......
    .....

In your case, there is no kubeflow-userid is injected. I guess the notebook-name: mynotebook is wrong. that's my guess.

edit oh another possibility is that in your kubeflow, the identity header name is not kubeflow-userid. You may double check your kubeflow config.

arshashi commented 3 years ago

@yhwang My notebook server name was incorrect and now it works for me with the above two changes. Thanks alot for your time and suggestion, I was stuck with this issue for long time.

Bobgy commented 3 years ago

Thanks @yhwang! The solution sounds secure and reasonable!

I guess the only concern I have, is that other users sharing the namespace can act as namespace owner's permissions.

So if we can make up a service account that represents this namespace and only grant permission to access the same namespace, that will be ideal, but I believe that is totally possible, at least for GCP.

/cc @yanniszark @IronPan what do you think about this approach to grant cluster workload access to KFP api?

yhwang commented 3 years ago

@Bobgy

So if we can make up a service account that represents this namespace and only grant permission to access the same namespace

In the RBAC config I posted above, the notebook would be cluster.local/ns/mynamespace/sa/default-editor. So ml-pipeline-apiserver could use that and only allow that serviceaccount to access specific resources, for example: pipelines created by mynamespace's owner

Bobgy commented 3 years ago

Exactly, that was one of the auth options in the design too.

PRs welcomed on reading istio request header approach for authentication

yhwang commented 3 years ago

@Bobgy can you elaborate more about this:

PRs welcomed on reading istio request header approach for authentication

For istio request header, it could be handled by istio RBAC. I guess you mean adding corresponding istio RBAC in this case. Or you was talking about other implementation?

yhwang commented 3 years ago

@Bobgy I have a change here to create the istio configs with the notebook server. It also removes istio configs when deleting notebook server. Do you think it align with your plan? and should I have a PR for that?

yanniszark commented 3 years ago

Hi all! I will try to answer some of the questions that came up in this issue:

I guess it should be k8s RBAC (because of SubjectAccessReview). How about using istio RBAC instead? Because the goal is to protect the pipeline server API endpoints and actually istio can do that by setting up proper istio RBAC for those endpoints. KFAM just needs to maintain correct istio RBAC objects. It ties to istio though.

@IronPan @gaoning777 @yanniszark do you have context on this?

Istio RBAC is deprecated so I'm going to talk about Istio AuthorizationPolicy. Istio AuthorizationPolicy is a useful tool, but besides the obvious disadvantages (tied to Istio, harder to use, etc.), it doesn't have the flexibility that the Pipelines model requires right now. Consider that in the current code:

On the contrary, we use Kubernetes RBAC as an authorization database and perform whatever complex logic we want in the API-Server. And as an authorization database, Kubernetes RBAC makes much more sense. Does this answer your question @yhwang? Please tell me if something is not clear! cc @Bobgy

Please also take a look at the Kubeflow-wide guideline for authorization, which prescribes RBAC and SubjectAccessReview: https://github.com/kubeflow/community/blob/master/guidelines/auth.md

I added an envoy filter to add kubeflow-userid header for those HTTP traffics going to ml-pipeline.kubeflow.svc.cluster.local:8888. Envoy filter to inject the kubeflow-userid header from notebook to ml-pipeline service. In the example below, the notebook server's name is mynotebook and userid for namespace: mynamespace is user@example.com

Thanks @yhwang! The solution sounds secure and reasonable! I guess the only concern I have, is that other users sharing the namespace can act as namespace owner's permissions.

I want to make it clear that the config I see outlined here is NOT secure. The sidecar can impersonate ANY identity. The correct way to enable programmatic access is to:

So if we can make up a service account that represents this namespace and only grant permission to access the same namespace, that will be ideal, but I believe that is totally possible, at least for GCP.

@Bobgy but we don't need to have a notion of a ServiceAccount that "represents" a namespace. All ServiceAccount identities will be able to prove themselves to the Pipelines API Server with the design outlined above.

Bobgy commented 3 years ago

@yanniszark these are great points, I'll add some of my latest ideas in a few days

yhwang commented 3 years ago

@yanniszark thanks for those information. Here are my thoughts

Personally, I don't think using istio config is bad, especially, you already use mTLS from istio. The key point of using istio is to control/define application level access/permission. For me, using k8s RBAC to achieve application level security is kind of abuse of it. It's main purpose is to control the k8s resource level permission. For example, a istio RBAC/authorization config is needed to allow a notebook server to access ml-pipeline-service. This is application level. Creating a notebook server by a user contains 2 level of permissions:

The envoyfilter I provided above is just to incorporate with current ml-pipeline limitation. Once ml-pipeline improves, this approach should also be modified. The audience-bound solution could also be done by istio JWTRule. In this case, we don't need to reinvent the wheel.

These are my two cents.

lukemarsden commented 3 years ago

Is there a solution for making kfp.Client() work out of the box in an in-cluster pod in a KF cluster with no auth? e.g. a cluster installed with https://raw.githubusercontent.com/kubeflow/manifests/v1.1-branch/kfdef/kfctl_k8s_istio.v1.1.0.yaml

It currently fails with:

  File "/usr/local/lib/python3.8/site-packages/kfp_server_api/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Wed, 23 Sep 2020 14:51:35 GMT', 'server': 'istio-envoy', 'connection': 'close', 'x-envoy-decorator-operation': 'ml-pipeline.kubeflow.svc.cluster.local:8888/*'})
HTTP response body: RBAC: access denied

This is with KFP SDK 1.0.1.

yhwang commented 3 years ago

@lukemarsden You can check my comments above for a workaround: ServiceRoleBinding EnvoyFilter

And @yanniszark mentioned this:

I want to make it clear that the config I see outlined here is NOT secure. The sidecar can impersonate ANY identity.

It uses a fixed userid to access ml-pipeline-service and I suggest the value should be the userid who creates the notebook server. If you are comfortable with this, then you can try the workaround.

yanniszark commented 3 years ago

@yhwang thanks for sharing your thoughts! I have some remarks on some of them:

The key point of using istio is to control/define application level access/permission. For me, using k8s RBAC to achieve application level security is kind of abuse of it. It's main purpose is to control the k8s resource level permission.

As mentioned above, Istio AuthorizationPolicy just doesn't have the flexibility to deal with the complexity required for authorization. So for Pipelines, we needed an authorization database. We could make our own from scratch or try to put something together from existing solutions. Then we would need to provide tooling (clients, CLI, UI) and docs around it. Instead, we opt to use Kubernetes RBAC as the authz database and we get:

The audience-bound solution could also be done by istio JWTRule. Can you elaborate? Currently, ServiceAccountTokens can only be validated with the TokenReview API call and AFAIK Istio JWTRule doesn't support that.

@lukemarsden if you are ok with working without auth features, you can also disable Istio RBAC completely (ClusterRBACConfig object). I wouldn't recommend it for any multitenant environment though.

lukemarsden commented 3 years ago

Thanks @yanniszark and @yhwang for the guidance.

Applying this config:

apiVersion: rbac.istio.io/v1alpha1
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: "OFF"

changes the error to:

kfp_server_api.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Fri, 25 Sep 2020 12:41:59 GMT', 'x-envoy-upstream-service-time': '1', 'server': 'istio-envoy', 'x-envoy-decorator-operation': 'ml-pipeline.kubeflow.svc.cluster.local:8888/*', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","message":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","code":3,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","error_details":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []"}]}

So I suspect disabling Istio RBAC alone is not sufficient. I will try experimenting with the ServiceRoleBinding and EnvoyFilter @yhwang suggested, perhaps only the EnvoyFilter is required to pin the namespace owner to anonymous in the case where Istio RBAC is disabled.

Update: I realized this error probably relates to the client.create_run_from_pipeline_func call needing a namespace argument, not the HTTP request missing a header. Testing that now.

Update 2: Adding the namespace arg changed the error to:

HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Fri, 25 Sep 2020 13:54:25 GMT', 'x-envoy-upstream-service-time': '1', 'server': 'istio-envoy', 'x-envoy-decorator-operation': 'ml-pipeline.kubeflow.svc.cluster.local:8888/*', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","message":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Request header error: there is no user identity header.","error_details":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header."}]}

So I guess I do need that user identity header, too! I'm trying this:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: istio-system
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: anonymous@kubeflow.org

But it's not working yet.

yhwang commented 3 years ago

@lukemarsden I guess your envoyfilter should be added to anonymous namespace where your notebook is. and don't forget to specify your notebook server by using the workloadSelector, for example:

  workloadSelector:
    labels:
      notebook-name: mynotebook
Bobgy commented 3 years ago

@lukemarsden if you are ok with working without auth features, you can also disable Istio RBAC completely (ClusterRBACConfig object). I wouldn't recommend it for any multitenant environment though.

@yanniszark note, KFP api server does not expose configuration today to disable its authz, so it's not enough just disabling istio RBAC completely.

lukemarsden commented 3 years ago

The following config is (finally) working for me. Note that my use case isn't for notebooks, but rather an in-cluster kfp client, so switching the listenerType from GATEWAY to SIDECAR_INBOUND was necessary so that you got the header added on in-cluster traffic as well.

# Create anonymous namespace (profile) in Kubeflow without having to click
# a button on a web page
curl -XPOST http://localhost:31380/api/workgroup/create

# disable Istio RBAC to workaround
# https://github.com/kubeflow/pipelines/issues/4440#issuecomment-697920377
kubectl apply -f - <<EOF
apiVersion: rbac.istio.io/v1alpha1
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: "OFF"
EOF

# tell kfp that user=anonymous@kubeflow.org even for in-cluster clients
# like pachyderm (listenerType=SIDECAR_INBOUND, not GATEWAY)
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-user-everywhere
  namespace: istio-system
spec:
  filters:
    - filterConfig:
        inlineCode: |
          function envoy_on_request(request_handle)
              request_handle:headers():replace("kubeflow-userid","anonymous@kubeflow.org")
          end
      filterName: envoy.lua
      filterType: HTTP
      insertPosition:
        index: FIRST
      listenerMatch:
        listenerType: SIDECAR_INBOUND
EOF

# Stop header being added multiple times
kubectl delete envoyfilter -n istio-system add-user-filter
mosyang commented 3 years ago

@lukemarsden Great thanks to your comments! This works for me. In our case, we don't want any auth or RBAC with kfctl_k8s_istio.v1.1.0.yaml deployment. All team members can freely run kfp.Client() to share DGX resource.

Bobgy commented 3 years ago

Coming a little late, but let me explain my current thoughts.

First, we don't want to ask users to configure istio for RBAC access, istio configs are really brittle and requires a lot of knowledge to use and debug. Therefore, my ideal setup is like:

  1. KFP api server accepts all traffic
  2. KFP api server reads header X-Forwarded-Client-Cert injected by istio sidecar: https://stackoverflow.com/a/58099997 X-Forwarded-Client-Cert should contain auth information like spiffe://cluster.local/ns/<namespace>/sa/<service account>.

3.1 If a request comes from istio gateway (probably we can configure this), then KFP api server interprets it as coming from users reading the special header like kubeflow-userid

3.2 If a request comes from other istio mTLS enabled sources, we know which service account initiated the request.

3.3 If a request doesn't have X-Forwarded-Client-Cert, it's not authed by istio, we may develop other ways for auth like providing service account token with pipeline audience

  1. When API server knows requester identity, KFP api server can use SubjectAccessReview to test if the corresponding user/service account can access a certain resource representing KFP using Kubernetes RBAC.

With this setup, access to all KFP resources are backed by Kubernetes RBAC. If users have notebook servers in-cluster with istio sidecar (mTLS enabled), they only need to grant K8s RBAC permissions to those servers' service accounts.

And we should provide an option to disable all of the authz checks, so if it's not useful for an org, they can just disable it.

What are your thoughts? @yanniszark @yhwang

yanniszark commented 3 years ago

@Bobgy yes, what you describe is pretty much how I planned to use Istio mTLS for authentication (via the XFCC header). As for SubjectAccessReview, we plan on delivering it after Kubecon is over. We have also refactored the authentication code a bit in order to support multiple auth methods (we call them authenticators). We'll be pushing that upstream as well, after SubjectAccessReview. Does this sound good?

yhwang commented 3 years ago

@Bobgy sounds good to me. Let me add some cases here:

@Bobgy and @yanniszark Please let me know anything I can help. I'd love to help on filling the gaps.

Bobgy commented 3 years ago

@yanniszark I see. I didn't see any explanation how you planned to actually implement this. I am glad it's the same.

Bobgy commented 3 years ago

@Bobgy sounds good to me. Let me add some cases here:

  • user A creates notebook server which is in namespace A. user A adds user B as collaborator into his namespace A. When user B runs code on the notebook server which resides in namespace A, the XFCC header should be the notebook service account in namespace A (i.e. cluster.local/ns/A/sa/default-editor). In this case, KFP api server can only verify the SubjectAccessView against service account of notebook server of namespace A but not user B. is this correct?

Yes, that's right, because user A can also use this notebook server. It should have its own identity.

  • Is the RBAC/authorization of istio still enabled in general? if yes, we still need to add corresponding istio RBAC/authorization in order to allow the user to call KFP SDK from notebook server in addition to K8s RBAC. Or istio RBAC/authorization will be turned off?

Istio authz is still enabled, but the KFP manifest will include an istio authz rule to allow any traffic to KFP API server.

@Bobgy and @yanniszark Please let me know anything I can help. I'd love to help on filling the gaps.

Thank you for offering help! @yanniszark are you talking about contributing this implementation after the kubecon in November?

I believe @yanniszark has implemented some of this in minikf fork of KFP. You can ask him if you can help in any way. When he creates the final PRs, welcome some help on review and fixes (if needed).

yhwang commented 3 years ago

@Bobgy Thanks!

Yes, that's right, because user A can also use this notebook server. It should have its own identity.

I just wonder if KFP API server needs to know the request is from User A or User B? For example, User B can access run/experiment in namespace A and B. But User A only can access namespace A. By using cluster.local/ns/A/sa/default-editor from XFCC header, KFP API server won't be able to know which user sending the request but only notebook server. Then we need to provide a mechanism to allow users to legally specify kubeflow-userid header in KFP SDK. Or this is out of scope?

Bobgy commented 3 years ago

I just wonder if KFP API server needs to know the request is from User A or User B?

With above design, KFP API Server doesn't need to know. It uses the service account as requester identity. So we won't need SDK method to add kubeflow userid.

yhwang commented 3 years ago

With above design, KFP API Server doesn't need to know. It uses the service account as requester identity. So we won't need SDK method to add kubeflow userid.

So the following scenario would fail and we should document it as limitation

User settings:

Scenario: When User B runs the code kfp.Client.create_run_from_pipeline_func() and specifies namespace=B on the notebook server that User A creates in namespace A, I guess User B would expects he can access the resource in his own namespace. But based on the design, he can't becaue the XFCC is cluster.local/ns/A/sa/default-editor and KFP API Server only allow this identity to access resource in namespace A.

Bobgy commented 3 years ago

That's right, and I'd rather consider that as expected behavior. The scenario is based on the assumption, user A/B didn't authenticate as themselves when using the notebook server, therefore, they should only be able to access what the notebook server's service account can access.

User B will still have the choice to use KFP SDK to connect to cluster public endpoint and use his user credentials for authentication. In that case, User B will have access to namespace B, but the notebook server are shared between user A and B, so user B needs to be aware that his credentials may be used by anyone else having access to namespace A (which should be avoided).

Bobgy commented 3 years ago

With above design, KFP API Server doesn't need to know. It uses the service account as requester identity. So we won't need SDK method to add kubeflow userid.

So the following scenario would fail and we should document it as limitation

User settings:

  • User A and he owns namespace A
  • User B and he owns namespace B
  • User A invites User B as collaborator, so User B can also access namespace A
  • User B can access run/experiment in namespace A and B. But User A only can access namespace A.

Scenario: When User B runs the code kfp.Client.create_run_from_pipeline_func() and specifies namespace=B on the notebook server that User A creates in namespace A, I guess User B would expects he can access the resource in his own namespace. But based on the design, he can't becaue the XFCC is cluster.local/ns/A/sa/default-editor and KFP API Server only allow this identity to access resource in namespace A.

Another scenario (I guess this is more of what you want to support) is that, we have the same users A and B and namespaces A and B and namespace A is shared to User B. User B uses a notebook server in namespace B and tries to kfp.Client.create_run_from_pipeline_func() for namespace=A. The notebook server in namespace B (despite User B being the owner and namespace A is shared to User B) won't have access to pipelines in namespace A.

To fix the permission issue, user A should also invite namespace B's default-editor service account as collaborator to namespace A. So conceptually, in this security model, a service account in the cluster has its own identity different from the user, and access are managed by the identity sending the request. Arguably, Kubernetes has the same auth model, if you run a notebook in cluster, you cannot use your own account's permissions, the notebook has its service account's permissions.

jonasdebeukelaer commented 3 years ago

hey, having an issue applying @yhwang's workaround. I'm still getting the error as if the envoy filter is not being applied:

409 ... HTTP response body: {"error":"Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header." ...

however as far as I can tell I applied the envoy filter correctly (jonas@satalia.com is owner of default-profile namespace):

> kubectl -n default-profile get envoyfilters.networking.istio.io add-header -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"EnvoyFilter","metadata":{"annotations":{},"name":"add-header","namespace":"default-profile"},"spec":{"configPatches":[{"applyTo":"VIRTUAL_HOST","match":{"context":"SIDECAR_OUTBOUND","routeConfiguration":{"vhost":{"name":"ml-pipeline.kubeflow.svc.cluster.local:8888","route":{"name":"default"}}}},"patch":{"operation":"MERGE","value":{"request_headers_to_add":[{"append":true,"header":{"key":"kubeflow-userid","value":"jonas@satalia.com"}}]}}}],"workloadSelector":{"labels":{"notebook-name":"test-notebook"}}}}
  creationTimestamp: "2020-10-21T15:30:18Z"
  generation: 1
  name: add-header
  namespace: default-profile
  resourceVersion: "33518018"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/default-profile/envoyfilters/add-header
  uid: 72d8ce19-9246-431b-b61c-74b3da7b4732
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: jonas@satalia.com
  workloadSelector:
    labels:
      notebook-name: test-notebook

What and where should be picking this up/using this? How should I debug?

@yhwang you mentioned something about my kubeflow config:

oh another possibility is that in your kubeflow, the identity header name is not kubeflow-userid. You may double check your kubeflow config.

where do I check this?

Thanks!

Bobgy commented 3 years ago

@jonasdebeukelaer are you on GCP? The header should be 'x-goog-authenticated-user-email'

jonasdebeukelaer commented 3 years ago

thanks @Bobgy! Got it working now.

Also note the header value has to be formatted like accounts.google.com:<email>