kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
825 stars 886 forks source link

Getting Istio RBAC issues when submitting a pipeline from a Notebook #2747

Closed kimwnasptd closed 3 months ago

kimwnasptd commented 5 months ago

Validation Checklist

Version

1.9

Describe your issue

I tried to create a pipeline from the KFP SDK, from a Notebook, but the python code that creates the experiment/run is failing with the following errors

From the notebook cell

ERROR:root:Failed to get healthz info attempt 1 of 5.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/kfp/client/client.py", line 424, in get_kfp_healthz
    return self._healthz_api.get_healthz()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
kfp_server_api.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Wed, 05 Jun 2024 17:28:20 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '11'})
HTTP response body: RBAC: access denied

And from the logs of the KFP API Server istio sidecar

[2024-06-05T17:28:20.788Z] "GET /apis/v2beta1/healthz HTTP/1.1" 403 - rbac_access_denied_matched_policy[none] - "-" 0 19 0 - "-" "OpenAPI-Generator/2.0.3/python" "9a5
0db59-9e33-420d-bb16-efb29c5e504f" "ml-pipeline.kubeflow.svc.cluster.local:8888" "-" inbound|8888|| - 10.244.1.33:8888 10.244.2.23:47694 outbound_.8888_._.ml-pipeline.kubeflow.svc.cluster.local -

Steps to reproduce the issue

  1. Deploy the latest v1.9-branch
  2. Use a KinD cluster
  3. Create the PodDefault for KFP from https://www.kubeflow.org/docs/components/pipelines/v1/sdk/connect-api/#full-kubeflow-subfrom-inside-clustersub
  4. Create an ipynb with Data Passing pipeline https://github.com/kubeflow/pipelines/blob/master/samples/tutorials/Data%20passing%20in%20python%20components/Data%20passing%20in%20python%20components%20-%20Files.py
  5. Submit the pipeline

Put here any screenshots or videos (optional)

No response

kimwnasptd commented 5 months ago

@juliusvonkohout @kromanow94 after playing a bit around and catching up with the oauth2-proxy with some quick testing the root cause must have been the RequestAuthentication for the K8s issuer. https://github.com/kubeflow/manifests/blob/96ce068e16b2a707464471bddc0d2a58e403d1fc/common/oidc-client/oauth2-proxy/components/istio-m2m/requestauthentication.yaml#L13-L14

Specifically, this RequestAuthentication will create a header kubeflow-userid from the ServiceAccount token that the KFP SDK adds as an Authentication: Bearer <token> header to the request.

But by adding this header it means that the request will not match the current AuthorizationPolicy, that blocks requests that do NOT come from the IngressGateway to set the kubeflow-userid header https://github.com/kubeflow/manifests/blob/96ce068e16b2a707464471bddc0d2a58e403d1fc/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml#L37

kimwnasptd commented 5 months ago

In this case by removing the above lines, from the RequestAuthentication, that set this header for K8s issuer tokens removes the issue.

This comes back a bit to the discussion about using headers vs tokens everywhere. But to move us a bit forward I suggest that for the K8s issuer RequestAuthentication objects we just don't set these headers for now.

thesuperzapper commented 5 months ago

I haven't been following the oauth2-proxy stuff, but this is important because this issue will not just affect Notebook Pods.

Users have always assumed that Pods that directly access the ml-pipeline-ui.kubeflow.svc.cluster.local service can pass their Authorization header (with a Pod ServiceAccount Token) and KFP will give them the associated RBAC of that ServiceAccount.

That is, there will be no JWT (from dex/oauth2-proxy) for these requests, because they authenticate themselves with their Pod service account token.

See the "Full Kubeflow (from inside cluster)" part of this page for examples which need to work:

kimwnasptd commented 5 months ago

I've put quite a bit of more thought after seeing the above. I'm preparing a small spec to go over the details. We'll need to indeed be careful on both the expectations around JWTs and also what changes to do, by having a uniform plan

kimwnasptd commented 5 months ago

So what I propose for unblocking this is to

  1. update the AuthorizationPolicy in KFP https://github.com/kubeflow/manifests/blob/96ce068e16b2a707464471bddc0d2a58e403d1fc/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml#L36-L38
  2. Change it to always needing a JWT for talking to KFP
    - from:
     - source:
         requestPrincipals: ["*"]

And for the above to ensure

  1. Istio is configured with RequestAuthentication to recognise id_tokens (ServiceAccount tokens) where the issuer is the K8s cluster
  2. The above RequestAuthentication must set the kubeflow-userid header from the JWT

If the header is not set, via RequestAuthentication, then this means anyone can forge a request with a K8s ServiceAccount token but also set this header. Then because the backend evaluates first the header, they can impersonate any user.

juliusvonkohout commented 5 months ago

@thesuperzapper yes, it is a critical feature that we have to fix for 1.9 @kimwnasptd I can only review this next week. Hopefully by then @kromanow94 is back from vacation as well.

I really like the JWT specification proposal and we should also create an additional test for that usecase to our github actions.

kromanow94 commented 5 months ago

Hey All,

I'm finally back from vacation and can chime in.

@kimwnasptd, to comment on the https://github.com/kubeflow/manifests/issues/2747#issuecomment-2150633393:

(...) the root cause must have been the RequestAuthentication for the K8s issuer.

I wouldn't call this the root cause. By removing the configuration from RequestAuthentication that puts the sub claim in kubeflow-userid, we're only mitigating the issue in KFP backend but the other components that require kubeflow-userid header will stop working as they will not have the header available for auth decisions.

But, I'd call the root cause what you've described about the AuthorizationPolicy/ml-pipeline which is blocking traffic with the kubeflow-userid header configured.

We also should take into account if we want to force auth check with oauth2-proxy directly for ml-pipeline or not. The difference is that:

I'm a fan of the first option because it is more secure, streamlined and promotes decoupling.

I present you this diff that you can apply locally on your v1.9-branch branch to make it work with the preffered scenario:

diff --git a/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml b/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
index a9a45e5e..cbef023f 100644
--- a/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
+++ b/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
@@ -32,10 +32,24 @@ spec:
         - cluster.local/ns/kubeflow/sa/ml-pipeline-scheduledworkflow
         - cluster.local/ns/kubeflow/sa/ml-pipeline-viewer-crd-service-account
         - cluster.local/ns/kubeflow/sa/kubeflow-pipelines-cache
-  # For user workloads, which cannot user http headers for authentication
-  - when:
-    - key: request.headers[kubeflow-userid]
-      notValues: ['*']
+  - from:
+    - source:
+        requestPrincipals: ["*"]
+---
+apiVersion: security.istio.io/v1
+kind: AuthorizationPolicy
+metadata:
+  name: ml-pipeline-oauth2-proxy
+  namespace: kubeflow
+spec:
+  action: CUSTOM
+  provider:
+    name: oauth2-proxy
+  selector:
+    matchLabels:
+      app: ml-pipeline
+  rules:
+  - {}
 ---
 apiVersion: security.istio.io/v1beta1
 kind: AuthorizationPolicy
diff --git a/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml b/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
index fd56fa31..8e88dff7 100644
--- a/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
+++ b/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
@@ -16,4 +16,4 @@ configMapGenerator:
   literals:
   - ALLOW_SELF_SIGNED_ISSUER=true
   - ENABLE_M2M_TOKENS=true
-  - EXTRA_JWT_ISSUERS=https://kubernetes.default.svc.cluster.local=https://kubernetes.default.svc.cluster.local
+  - EXTRA_JWT_ISSUERS=https://kubernetes.default.svc.cluster.local=https://kubernetes.default.svc.cluster.local,https://kubernetes.default.svc.cluster.local=pipelines.kubeflow.org

Now, assuming you also have the PodDefaults configured for accessing ml-pipeline, running such script from a KF Notebook successfully creates a KF Pipeline Run:

#!/usr/bin/env python3

from kfp import dsl
import kfp

client = kfp.Client()
experiment_name = "my-experiment"
experiment_namespace = "kubeflow-user-example-com"

@dsl.component
def add(a: float, b: float) -> float:
    """Calculates sum of two arguments"""
    return a + b

@dsl.pipeline(
    name="Addition pipeline",
    description="An example pipeline that performs addition calculations.",
)
def add_pipeline(
    a: float = 1.0,
    b: float = 7.0,
):
    first_add_task = add(a=a, b=4.0)
    second_add_task = add(a=first_add_task.output, b=b)

try:
    print("getting experiment...")
    experiment = client.get_experiment(
        experiment_name=experiment_name, namespace=experiment_namespace
    )
    print("got experiment!")
except Exception:
    print("creating experiment...")
    experiment = client.create_experiment(
        name=experiment_name, namespace=experiment_namespace
    )
    print("created experiment!")

client.create_run_from_pipeline_func(
    add_pipeline,
    arguments={"a": 7.0, "b": 8.0},
    experiment_id=experiment.experiment_id,
    enable_caching=False,
)

@kimwnasptd, I saw your proposal about unifying JWT handling. I like that you've already proposed usage of the rules.from.source.requestPrincipals. I think this is the right approach from istio perspective as it would force usage of RequestAuthentication, which will also enable usage of group and sub claims in all the AuthorizationPolicies.

I have a few thoughts and I'll put them there in incoming days, but in general, I'd suggest:

This way we have uniformed way of authz, both from out of cluster and in-cluster perspective, and we don't drop the security behind istio-ingressgateway. RBAC can still be defined with RoleBindings but the source of truth for auth should always be a JWT and never a header.


So, from my perspective, I think it would be best to define a goal where at end we use only JWTs from Authorization headers and drop the kubeflow-userid and kubeflow-groups headers completely. This will not be possible for 1.9 because of how close we are to the release but maybe for 1.10. I'd be definitely available to help with this initiative, both for implementation of manifests, golang and python code.

Reference on the implementation of direct JWT verification: https://medium.com/@cheickzida/golang-implementing-jwt-token-authentication-bba9bfd84d60

kromanow94 commented 5 months ago

Hey, I continued and updated some thoughts in the proposal: Uniform way for handling JWTs: https://github.com/kubeflow/manifests/pull/2748#issuecomment-2172374596

In summary, I had some thoughts and decided to drop the emphasis on not removing and tightening the implementation of JWT in Kubeflow components. While this is still something I'd love to see, I understand that have to balance between the functionality and development and offload whatever possible to external components, so just relying on Istio to verify and authorize requests based on the JWTs is fine.

With the above in mind, I agree that it should be enough to change the AuthorizationPolicy/ml-pipeline rules to only allow user traffic if any trusted principal is available. My proposed changes to add pipelines.kubeflow.org audience to oauth2-proxy and AuthorizationPolicy/ml-pipeline-oauth2-proxy is not something we should add as part of the resolution for this issue.

kromanow94 commented 5 months ago

@thesuperzapper

That is, there will be no JWT (from dex/oauth2-proxy) for these requests, because they authenticate themselves with their Pod service account token.

The ServiceAccount Token is a K8s Issued JWT and dex also provides JWT after login. Also, oauth2-proxy is configured to accept JWTs from dex and kubernetes.svc.cluster.local.

Please see:

From the surrounding architecture perspective, we're fully capable of using JWTs directly in KF Components. But, we would have to switch from TokenReview API to some generic JWT Validation code/routine to fully use it.

kromanow94 commented 5 months ago

I put my comments in the proposal: Uniform way for handling JWTs. I like where this is going but we also have to cover for current release.

I've made a PR where I change the AuthorizationPolicy/ml-pipeline to allow access from any trusted principal and will add test with gh-workflow to create KF Notebook and start KF Pipeline Run from that Notebook. This is still in progress.

authorize requests to ml-pipeline endpoint if contains any trusted principals (#2753)

juliusvonkohout commented 4 months ago

@kromanow94 we also need a negative test to check that it does not work without the proper token, but just the faked userid header. Lets do that in a follow up PR.

since we have to get RC.2 ready and Kimonas approved it in https://github.com/kubeflow/manifests/issues/2747#issuecomment-2151541954. I merged your PR https://github.com/kubeflow/manifests/pull/2753, but we have to do a full assessment next week with more security tests.

And we have to bring some of those changes to Kubeflow/pipelines or it will break with the next synchronization again.

kromanow94 commented 4 months ago

@juliusvonkohout didn't you mean @kromanow94 ? 😅

Yes, negative test also makes sense. I can take care of this but only next week.

If you'd like my help on bringing those changes to kubeflow/pipelines, I can take care of that but maybe you can describe what should be a part of these changes?

juliusvonkohout commented 4 months ago

@juliusvonkohout didn't you mean @kromanow94 ? 😅

Yes, negative test also makes sense. I can take care of this but only next week.

If you'd like my help on bringing those changes to kubeflow/pipelines, I can take care of that but maybe you can describe what should be a part of these changes?

@juliusvonkohout didn't you mean @kromanow94 ? 😅

Yes, negative test also makes sense. I can take care of this but only next week.

If you'd like my help on bringing those changes to kubeflow/pipelines, I can take care of that but maybe you can describe what should be a part of these changes?

Yes the right Romanov :-D Especially the change of the KFP authorizationpolicy to requestprincipals in their multi-user manifests is important.

juliusvonkohout commented 4 months ago

https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/installs/multi-user/istio-authorization-config.yaml is the important file

juliusvonkohout commented 4 months ago

@nagar-ajay it works in https://github.com/kubeflow/manifests/actions/runs/9741004851/job/26879289585?pr=2781 and is in 1.9rc.2, but we will leave this issue open until it is fixed upstream.

juliusvonkohout commented 3 months ago

It is fixed in manifests and upstreaming is tracked in https://github.com/kubeflow/manifests/issues/2804