Closed kimwnasptd closed 3 months ago
@juliusvonkohout @kromanow94 after playing a bit around and catching up with the oauth2-proxy
with some quick testing the root cause must have been the RequestAuthentication
for the K8s issuer.
https://github.com/kubeflow/manifests/blob/96ce068e16b2a707464471bddc0d2a58e403d1fc/common/oidc-client/oauth2-proxy/components/istio-m2m/requestauthentication.yaml#L13-L14
Specifically, this RequestAuthentication
will create a header kubeflow-userid
from the ServiceAccount token that the KFP SDK adds as an Authentication: Bearer <token>
header to the request.
But by adding this header it means that the request will not match the current AuthorizationPolicy, that blocks requests that do NOT come from the IngressGateway to set the kubeflow-userid
header
https://github.com/kubeflow/manifests/blob/96ce068e16b2a707464471bddc0d2a58e403d1fc/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml#L37
In this case by removing the above lines, from the RequestAuthentication
, that set this header for K8s issuer tokens removes the issue.
This comes back a bit to the discussion about using headers vs tokens everywhere. But to move us a bit forward I suggest that for the K8s issuer RequestAuthentication
objects we just don't set these headers for now.
I haven't been following the oauth2-proxy stuff, but this is important because this issue will not just affect Notebook Pods.
Users have always assumed that Pods that directly access the ml-pipeline-ui.kubeflow.svc.cluster.local
service can pass their Authorization
header (with a Pod ServiceAccount Token) and KFP will give them the associated RBAC of that ServiceAccount.
That is, there will be no JWT (from dex/oauth2-proxy) for these requests, because they authenticate themselves with their Pod service account token.
See the "Full Kubeflow (from inside cluster)" part of this page for examples which need to work:
I've put quite a bit of more thought after seeing the above. I'm preparing a small spec to go over the details. We'll need to indeed be careful on both the expectations around JWTs and also what changes to do, by having a uniform plan
So what I propose for unblocking this is to
AuthorizationPolicy
in KFP
https://github.com/kubeflow/manifests/blob/96ce068e16b2a707464471bddc0d2a58e403d1fc/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml#L36-L38- from:
- source:
requestPrincipals: ["*"]
And for the above to ensure
RequestAuthentication
to recognise id_tokens
(ServiceAccount tokens) where the issuer is the K8s clusterRequestAuthentication
must set the kubeflow-userid
header from the JWTIf the header is not set, via RequestAuthentication
, then this means anyone can forge a request with a K8s ServiceAccount token but also set this header. Then because the backend evaluates first the header, they can impersonate any user.
@thesuperzapper yes, it is a critical feature that we have to fix for 1.9 @kimwnasptd I can only review this next week. Hopefully by then @kromanow94 is back from vacation as well.
I really like the JWT specification proposal and we should also create an additional test for that usecase to our github actions.
Hey All,
I'm finally back from vacation and can chime in.
@kimwnasptd, to comment on the https://github.com/kubeflow/manifests/issues/2747#issuecomment-2150633393:
(...) the root cause must have been the RequestAuthentication for the K8s issuer.
I wouldn't call this the root cause. By removing the configuration from RequestAuthentication
that puts the sub
claim in kubeflow-userid
, we're only mitigating the issue in KFP backend but the other components that require kubeflow-userid
header will stop working as they will not have the header available for auth decisions.
But, I'd call the root cause what you've described about the AuthorizationPolicy/ml-pipeline
which is blocking traffic with the kubeflow-userid
header configured.
We also should take into account if we want to force auth check with oauth2-proxy directly for ml-pipeline
or not. The difference is that:
if we enable that through AuthorizationPolicy
with custom action, oauth2-proxy
as the provider and app=ml-pipeline
as the selector, oauth2-proxy
will stand as guard of the KFP API and will only allow traffic that is accepted by auth middleware.
oauth2-proxy
has to be configured with extra jwt issuer configured
with sub=aud pair like so:https://kubernetes.default.svc.cluster.local=pipelines.kubeflow.org
if we don't configure AuthorizationPolicy
with custom action and oauth2-proxy
as the provider for ml-pipeline
, we omit oauth2-proxy
and rely directly on istio and RequestAuthentication. I guess this is not very bad, but from my perspective oauth2-proxy
(or other configured auth middleware) should be the tool that makes the decision if a request should be accepted to the service mesh or not. This opens some flexibility in the future where we can loose some coupling with Istio and allow other service meshes to work with Kubeflow and maybe improve the boundaries of each tool/depdenency.
I'm a fan of the first option because it is more secure, streamlined and promotes decoupling.
I present you this diff that you can apply locally on your v1.9-branch
branch to make it work with the preffered scenario:
diff --git a/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml b/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
index a9a45e5e..cbef023f 100644
--- a/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
+++ b/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
@@ -32,10 +32,24 @@ spec:
- cluster.local/ns/kubeflow/sa/ml-pipeline-scheduledworkflow
- cluster.local/ns/kubeflow/sa/ml-pipeline-viewer-crd-service-account
- cluster.local/ns/kubeflow/sa/kubeflow-pipelines-cache
- # For user workloads, which cannot user http headers for authentication
- - when:
- - key: request.headers[kubeflow-userid]
- notValues: ['*']
+ - from:
+ - source:
+ requestPrincipals: ["*"]
+---
+apiVersion: security.istio.io/v1
+kind: AuthorizationPolicy
+metadata:
+ name: ml-pipeline-oauth2-proxy
+ namespace: kubeflow
+spec:
+ action: CUSTOM
+ provider:
+ name: oauth2-proxy
+ selector:
+ matchLabels:
+ app: ml-pipeline
+ rules:
+ - {}
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
diff --git a/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml b/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
index fd56fa31..8e88dff7 100644
--- a/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
+++ b/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
@@ -16,4 +16,4 @@ configMapGenerator:
literals:
- ALLOW_SELF_SIGNED_ISSUER=true
- ENABLE_M2M_TOKENS=true
- - EXTRA_JWT_ISSUERS=https://kubernetes.default.svc.cluster.local=https://kubernetes.default.svc.cluster.local
+ - EXTRA_JWT_ISSUERS=https://kubernetes.default.svc.cluster.local=https://kubernetes.default.svc.cluster.local,https://kubernetes.default.svc.cluster.local=pipelines.kubeflow.org
Now, assuming you also have the PodDefaults
configured for accessing ml-pipeline, running such script from a KF Notebook successfully creates a KF Pipeline Run:
#!/usr/bin/env python3
from kfp import dsl
import kfp
client = kfp.Client()
experiment_name = "my-experiment"
experiment_namespace = "kubeflow-user-example-com"
@dsl.component
def add(a: float, b: float) -> float:
"""Calculates sum of two arguments"""
return a + b
@dsl.pipeline(
name="Addition pipeline",
description="An example pipeline that performs addition calculations.",
)
def add_pipeline(
a: float = 1.0,
b: float = 7.0,
):
first_add_task = add(a=a, b=4.0)
second_add_task = add(a=first_add_task.output, b=b)
try:
print("getting experiment...")
experiment = client.get_experiment(
experiment_name=experiment_name, namespace=experiment_namespace
)
print("got experiment!")
except Exception:
print("creating experiment...")
experiment = client.create_experiment(
name=experiment_name, namespace=experiment_namespace
)
print("created experiment!")
client.create_run_from_pipeline_func(
add_pipeline,
arguments={"a": 7.0, "b": 8.0},
experiment_id=experiment.experiment_id,
enable_caching=False,
)
@kimwnasptd, I saw your proposal about unifying JWT handling. I like that you've already proposed usage of the rules.from.source.requestPrincipals
. I think this is the right approach from istio perspective as it would force usage of RequestAuthentication
, which will also enable usage of group
and sub
claims in all the AuthorizationPolicies
.
I have a few thoughts and I'll put them there in incoming days, but in general, I'd suggest:
istio-ingressgateway-service-account
principal and just ensure that any principal is available through rules.from.source.requestPrincipals
,pipelines.kubeflow.org
audience and rely on the standard kubernetes audience so we don't have to use PodDefaults
to enable secure api access from KF Notebook,AuthorizationPolicy
objects with custom action and oauth2-proxy as the provider for every component that's exposed to users (api-server, jupyter-web-app, central-dashboard, etc...), not just for the istio-ingressgateway, to bring security closer to the components,kubeflow-userid
and kubeflow-groups
headers can be injected. Relying on the JWTs directly would also bring maturity and more possibilities/flexibility in the future.
TokenReview
. If we were to rely on a more general approach and the kf components can verify the JWTs by themself, this would be much more secure. We can use a combination of both (so TokenReview
if the token was issued by K8s or github.com/golang-jwt/jwt/v5
otherwise; both should end up with SubjectAccessReview
).This way we have uniformed way of authz, both from out of cluster and in-cluster perspective, and we don't drop the security behind istio-ingressgateway. RBAC can still be defined with RoleBindings but the source of truth for auth should always be a JWT and never a header.
So, from my perspective, I think it would be best to define a goal where at end we use only JWTs from Authorization
headers and drop the kubeflow-userid
and kubeflow-groups
headers completely. This will not be possible for 1.9 because of how close we are to the release but maybe for 1.10. I'd be definitely available to help with this initiative, both for implementation of manifests, golang and python code.
Reference on the implementation of direct JWT verification: https://medium.com/@cheickzida/golang-implementing-jwt-token-authentication-bba9bfd84d60
Hey, I continued and updated some thoughts in the proposal: Uniform way for handling JWTs: https://github.com/kubeflow/manifests/pull/2748#issuecomment-2172374596
In summary, I had some thoughts and decided to drop the emphasis on not removing and tightening the implementation of JWT in Kubeflow components. While this is still something I'd love to see, I understand that have to balance between the functionality and development and offload whatever possible to external components, so just relying on Istio to verify and authorize requests based on the JWTs is fine.
With the above in mind, I agree that it should be enough to change the AuthorizationPolicy/ml-pipeline
rules to only allow user traffic if any trusted principal is available. My proposed changes to add pipelines.kubeflow.org
audience to oauth2-proxy
and AuthorizationPolicy/ml-pipeline-oauth2-proxy
is not something we should add as part of the resolution for this issue.
@thesuperzapper
That is, there will be no JWT (from dex/oauth2-proxy) for these requests, because they authenticate themselves with their Pod service account token.
The ServiceAccount Token is a K8s Issued JWT and dex also provides JWT after login. Also, oauth2-proxy
is configured to accept JWTs from dex and kubernetes.svc.cluster.local
.
Please see:
From the surrounding architecture perspective, we're fully capable of using JWTs directly in KF Components. But,
we would have to switch from TokenReview
API to some generic JWT Validation code/routine to fully use it.
I put my comments in the proposal: Uniform way for handling JWTs. I like where this is going but we also have to cover for current release.
I've made a PR where I change the AuthorizationPolicy/ml-pipeline
to allow access from any trusted principal and will add test with gh-workflow to create KF Notebook and start KF Pipeline Run from that Notebook. This is still in progress.
authorize requests to ml-pipeline endpoint if contains any trusted principals (#2753)
@kromanow94 we also need a negative test to check that it does not work without the proper token, but just the faked userid header. Lets do that in a follow up PR.
since we have to get RC.2 ready and Kimonas approved it in https://github.com/kubeflow/manifests/issues/2747#issuecomment-2151541954. I merged your PR https://github.com/kubeflow/manifests/pull/2753, but we have to do a full assessment next week with more security tests.
And we have to bring some of those changes to Kubeflow/pipelines or it will break with the next synchronization again.
@juliusvonkohout didn't you mean @kromanow94 ? 😅
Yes, negative test also makes sense. I can take care of this but only next week.
If you'd like my help on bringing those changes to kubeflow/pipelines
, I can take care of that but maybe you can describe what should be a part of these changes?
@juliusvonkohout didn't you mean @kromanow94 ? 😅
Yes, negative test also makes sense. I can take care of this but only next week.
If you'd like my help on bringing those changes to
kubeflow/pipelines
, I can take care of that but maybe you can describe what should be a part of these changes?@juliusvonkohout didn't you mean @kromanow94 ? 😅
Yes, negative test also makes sense. I can take care of this but only next week.
If you'd like my help on bringing those changes to
kubeflow/pipelines
, I can take care of that but maybe you can describe what should be a part of these changes?
Yes the right Romanov :-D Especially the change of the KFP authorizationpolicy to requestprincipals in their multi-user manifests is important.
@nagar-ajay it works in https://github.com/kubeflow/manifests/actions/runs/9741004851/job/26879289585?pr=2781 and is in 1.9rc.2, but we will leave this issue open until it is fixed upstream.
It is fixed in manifests and upstreaming is tracked in https://github.com/kubeflow/manifests/issues/2804
Validation Checklist
Version
1.9
Describe your issue
I tried to create a pipeline from the KFP SDK, from a Notebook, but the python code that creates the experiment/run is failing with the following errors
From the notebook cell
And from the logs of the KFP API Server istio sidecar
Steps to reproduce the issue
v1.9-branch
ipynb
with Data Passing pipeline https://github.com/kubeflow/pipelines/blob/master/samples/tutorials/Data%20passing%20in%20python%20components/Data%20passing%20in%20python%20components%20-%20Files.pyPut here any screenshots or videos (optional)
No response