kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
798 stars 864 forks source link

Enable and document for Kubeflow 1.10 Kserve secure inferencing from inside and outside the cluster with tokens #2811

Open juliusvonkohout opened 1 month ago

juliusvonkohout commented 1 month ago

Validation Checklist

Version

master

Describe your issue

From @kromanow94

I did some investigation and I found out that this is because the VirtualServices created by kserve are configured by default to use cluster-local-gateway. istio-ingressgateway is configured with AuthorizationPolicy istio-ingressgateway-oauth2-proxy which enforces the traffic to go through oauth2-proxy. There is no such AuthorizationPolicy for cluster-local-gateway.

So, I see two options:

  1. Configure Istio auth for current setup with cluster-local-gateway
    1. Create cluster-local-gateway-oauth2-proxy AuthorizationPolicy to enforce authentication with oauth2-proxy:
      apiVersion: security.istio.io/v1
      kind: AuthorizationPolicy
      metadata:
        name: cluster-local-gateway-oauth2-proxy
        namespace: istio-system
      spec:
        action: CUSTOM
        provider:
          name: oauth2-proxy
        rules:
        - {}
        selector:
          matchLabels:
            app: cluster-local-gateway
    2. Depending on your setup, if the model is deployed in Kubeflow managed namespace (KF Profile, for example kubeflow-user-example-com), you also have to configure access to the sklearn-iris deployment:
      apiVersion: security.istio.io/v1beta1
      kind: AuthorizationPolicy
      metadata:
        name: sklearn-iris-predictor-allow
        namespace: kubeflow-user-example-com
      spec:
        selector:
          matchLabels:
            serving.knative.dev/service: sklearn-iris-predictor
        action: ALLOW
        rules:
        - {}
    3. Testing with curl:
      $ curl -XPOST -v "http://sklearn-iris.kubeflow-user-example-com.svc.cluster.local/v1/models/sklearn-iris:predict" -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token)" -d '{"instances": [[6.8,  2.8,  4.8,  1.4], [6.0,  3.4,  4.5,  1.6]]}' -H "Content-Type: application/json"
      Note: Unnecessary use of -X or --request, POST is already inferred.
      * Host sklearn-iris.kubeflow-user-example-com.svc.cluster.local:80 was resolved.
      * IPv6: (none)
      * IPv4: 172.20.1.23
      *   Trying 172.20.1.23:80...
      * Connected to sklearn-iris.kubeflow-user-example-com.svc.cluster.local (172.20.1.23) port 80
      > POST /v1/models/sklearn-iris:predict HTTP/1.1
      > Host: sklearn-iris.kubeflow-user-example-com.svc.cluster.local
      > User-Agent: curl/8.7.1
      > Accept: */*
      > Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ikh3ZUQ2enNYYnRZNUFZQk8xX1ZKc3ZCZGwwRmR3dTdwRURiQXpDN3c5MncifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiLCJodHRwczovL2t1YmVybmV0ZXMuZGVmYXVsdC5zdmMiLCJodHRwczovL2t1YmVybmV0ZXMuZGVmYXVsdCJdLCJleHAiOjIwMjg1NDA4NzAsImlhdCI6MTcxMzE4MDg3MCwiaXNzIjoiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiLCJrdWJlcm5ldGVzLmlvIjp7Im5hbWVzcGFjZSI6Imt1YmVmbG93LXVzZXItZXhhbXBsZS1jb20iLCJwb2QiOnsibmFtZSI6ImN1cmwiLCJ1aWQiOiI3ZGI2ZjliNC0zZTliLTQ3ZDUtOWI4ZC0yZjhiMWVkNTVhZjkifSwic2VydmljZWFjY291bnQiOnsibmFtZSI6ImRlZmF1bHQtZWRpdG9yIiwidWlkIjoiODZhZmM3OGYtMTIzYS00MDMwLWI5YjQtZDllYWQ5YmE2NTc4In19LCJuYmYiOjE3MTMxODA4NzAsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlZmxvdy11c2VyLWV4YW1wbGUtY29tOmRlZmF1bHQtZWRpdG9yIn0.iY9WY7vqFQvxv3mzFYlnKQ3arG631movAfkIM1eWH_UdsQuWUupIz7wak81pOM23gBPpYxMT5HR1ZgVHYWG07Neh4e1ySUzhmPNNfydSIs-jUP1P8BjEPq3BdSQ9j_1pGggMDXFM4msFnEdAjlmpl23yDKOoJCj0RDV3fZIiA-mf7wLyiv_E38ah1ygZXYjrTCdzstCH02aZ7VCLc1dPETttE7nlF3YoaurwHJzZF6WHXmQlVdU2yMg0RT8uRDBUDI6WTq_guxjuEBEJrj166pXbp1MBvslMBUYXPV3StQ-AXnvQUyCBDoa5NOlJKOht3UOhGeS_-1A50ctjsl8xKw
      > Content-Type: application/json
      > Content-Length: 65
      > 
      * upload completely sent off: 65 bytes
      < HTTP/1.1 200 OK
      < content-length: 21
      < content-type: application/json
      < date: Mon, 15 Apr 2024 12:47:28 GMT
      < server: envoy
      < x-envoy-upstream-service-time: 9
      < 
      * Connection #0 to host sklearn-iris.kubeflow-user-example-com.svc.cluster.local left intact
      {"predictions":[1,1]}
  2. Change the kserve config to use istio-ingressgateway instead of cluster-local-gateway. This touches kserve which I don't have a lot of experience with. I tried changing the inferenceservice-config ConfigMap to define the "localGatewayService": "istio-ingressgateway.istio-system.svc.cluster.local" and "localGateway": "kubeflow/kubeflow-gateway" but that didn't work for some reason, probably something is missing...

@juliusvonkohout do you think we should add this AuthorizationPolicy for cluster-local-gateway to the manifests?

Steps to reproduce the issue

See above

Put here any screenshots or videos (optional)

No response

juliusvonkohout commented 1 month ago

@kromanow94 yes we should tackle this for 1.10

kromanow94 commented 1 month ago

@juliusvonkohout , makes sense. From conceptual point of view, what do we expect and what are the assumptions?

I guess it's easy decision for Out Of Cluster - the Authorization header should contain a valid JWT.

What about from in-cluster? Do we expect that access to given kserve endpoint should be enabled without Authorization header when called from the same namespace? From other namespace we probably want to use a valid JWT either way so it's similar as from out of cluster, just the istio ingress gateway would be cluster-local.

juliusvonkohout commented 1 month ago

"Do we expect that access to given kserve endpoint should be enabled without Authorization header when called from the same namespace?" I am not sure. Actually i like to enforce security everywhere. @kimwnasptd what do you think?

kromanow94 commented 1 month ago

Actually i like to enforce security everywhere

That would also ease and unify the approach from implementation perspective.

padrian2s commented 1 month ago

I bump into approval_prompt for v1.9, but yes the docs are missing

curl -XPOST -v "http://sklearn-iris.kubeflow-user-example-com.svc.cluster.local/v1/models/sklearn-iris:predict" -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token)" -d '{"instances": [[6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6]]}' -H "Content-Type: application/json" Note: Unnecessary use of -X or --request, POST is already inferred.

juliusvonkohout commented 1 month ago

"@juliusvonkohout do you think we should add this AuthorizationPolicy for cluster-local-gateway to the manifests?" Yes, we have to move one. This is too complex for many users to understand and we need to provide it out of the box.

juliusvonkohout commented 1 month ago

So @kromanow94 please create a PR if you have time.

juliusvonkohout commented 3 weeks ago

There seems to be a lot of wrong/outdated stuff in https://github.com/kserve/kserve/pull/3260 and we should probably fix it there as well.

jaffe-fly commented 3 weeks ago

by my test in kubeflow 1.9 cluster, https://github.com/kserve/kserve/tree/master/docs/samples/istio-dex,this is cant run . _auth_session["authservice_session"] is None.my _auth_session is

{'endpoint_url': 'http://xxxxxxx:18080/', 'redirect_url': 'http://xxxxxxx:18080/dex/auth/local/login?back=&state=sl6amu5zckyntv35z65solp4t', 'dex_login_url': 'http://xxxxxxx:18080/dex/auth/local/login?back=&state=sl6amu5zckyntv35z65solp4t', 'is_secured': True, 'session_cookie': 'oauth2_proxy_kubeflow_csrf=-GOrJQeg-v5_Dc_3HD12Td80HAuE9ld2eT3Y7dT2lyw1b2wUKEZaYJVjhqaBF7UW_elSQ8i08WsjXTGO2kys6VTPJWcHQ2uI_qMFqFp7Wn2_SIpjihvsfa-JX8VC-8Qag25f0jkbi-LMSu0IlRjVYoQZ2VDkriRa3bCq8j3zVOboKjcxVGujyrggNU1X4r-PyYVOujmdv7oxhw1wcS2HbYiQ84GZxPbb0S9FeFI8u2gkSvT5U_HQrpfl2MkoBg==|1723645690|pZzKszJaoId5Vb5vQN0-a99zY29WdQ2TfHtHdsIga-U=', 'authservice_session': None}

this issue will solve the question?

juliusvonkohout commented 3 weeks ago

by my test in kubeflow 1.9 cluster, https://github.com/kserve/kserve/tree/master/docs/samples/istio-dex,this is cant run . _auth_session["authservice_session"] is None.my _auth_session is

{'endpoint_url': 'http://xxxxxxx:18080/', 'redirect_url': 'http://xxxxxxx:18080/dex/auth/local/login?back=&state=sl6amu5zckyntv35z65solp4t', 'dex_login_url': 'http://xxxxxxx:18080/dex/auth/local/login?back=&state=sl6amu5zckyntv35z65solp4t', 'is_secured': True, 'session_cookie': 'oauth2_proxy_kubeflow_csrf=-GOrJQeg-v5_Dc_3HD12Td80HAuE9ld2eT3Y7dT2lyw1b2wUKEZaYJVjhqaBF7UW_elSQ8i08WsjXTGO2kys6VTPJWcHQ2uI_qMFqFp7Wn2_SIpjihvsfa-JX8VC-8Qag25f0jkbi-LMSu0IlRjVYoQZ2VDkriRa3bCq8j3zVOboKjcxVGujyrggNU1X4r-PyYVOujmdv7oxhw1wcS2HbYiQ84GZxPbb0S9FeFI8u2gkSvT5U_HQrpfl2MkoBg==|1723645690|pZzKszJaoId5Vb5vQN0-a99zY29WdQ2TfHtHdsIga-U=', 'authservice_session': None}

this issue will solve the question?

Hello, as said before, the kserve documentation is wrong there. Check the first post here on how to get it done with oauth2-proxy.

padrian2s commented 3 weeks ago

KServe is right but for internal K8s JWT tokens not with OAuth-proxy tokens that are generated outside the cluster.

juliusvonkohout commented 3 weeks ago

No, oauth2-proxy is explicitly there to use serviceaccountokens. Whether from inside or outside does not matter.