kserve / website

User documentation for KServe.
https://kserve.github.io/website/
Apache License 2.0
105 stars 122 forks source link

Write a AWS Cognito Guide #40

Open seizadi opened 3 years ago

seizadi commented 3 years ago

/kind feature

Describe the solution you'd like There is not a clear write up on how to configure AWS Cognito for Kfserving. The current kubeflow e2e does not cover kfserving configuration.

There is a good writeup for GCloud IAP. That guide could be used as a model for the AWS Cognito. This would go a long way to avoid having people struggle with the setup for kfserving on AWS.

Recent issues on this: https://github.com/kubeflow/kfserving/issues/1154 https://github.com/kubeflow/website/issues/2378

karlschriek commented 3 years ago

@seizadi do you have some initial pointers here on how to go about it? If I can manage to get it working for us I would be happy to contribute to a documentation afterwards

karlschriek commented 3 years ago

If I understood it correctly, all that is needed is to roll out an InferenceService with istio sidecar injection to allow for authorization (https://github.com/kubeflow/kfserving/blob/master/docs/samples/gcp-iap/sklearn-iap-with-authz.yaml) and then start up an additional VirtualService (https://github.com/kubeflow/kfserving/blob/master/docs/samples/gcp-iap/virtual-service.yaml).

We then post requests to the new VirtualService with these cookies:

cookies = {'AWSELBAuthSessionCookie-0': 'xxxxxx'
           'AWSELBAuthSessionCookie-1': 'xxxxxx'
          }

Seems fairly straightforward. Am I missing anything (aside from the fact that this need KF 1.1+)?

seizadi commented 3 years ago

You actually need to disable the istio-sidecar, AuthZ does not work, depending on your requirements, you might not be able to use this in production.

Here is some detail notes I wrote to myself when I was debugging this problem: https://github.com/seizadi/kubeflow-aws-cognito#debug-model

yuzisun commented 3 years ago

/cc @PatrickXYS

PatrickXYS commented 3 years ago

@seizadi Thanks I'll help on this, will collaborate with @yuzisun to make sure we provide well-written documentation for KFServing on AWS Cognito

PatrickXYS commented 3 years ago

/assign

karlschriek commented 3 years ago

@seizadi I've gone through your notes and am trying to replicate your solution.

I am rolling out the following:

apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  namespace: karl-schriek
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  default:
    predictor:
      sklearn:
        storageUri: "gs://kfserving-samples/models/sklearn/iris"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: kfserving-iap
  namespace: karl-schriek
spec:
  gateways:
  - kubeflow/kubeflow-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /kfserving/karl-schriek/sklearn-iris
    route:
    - destination:
        host: cluster-local-gateway.istio-system.svc.cluster.local
      headers:
        request:
          set:
            Host: sklearn-iris-predictor-default.karl-schriek.svc.cluster.local
      weight: 100
    rewrite:
        uri: /v1/models/sklearn-iris
    timeout: 300s

And am then sending a REST request with:

import json
import requests
import auth

namespace = "karl-schriek"
domain = "dev-kubeflow-120.learn-mlops.com"

# programmatically fetches cookies
cookies = auth.get_auth_cookie(
    host=f"https://www.{domain}",
    from_cache=False,
    return_as_dict=True,
    username="*******"
    password="*******"
)

def infer(inference_url, images_array,  cookies):
    data = json.dumps({"instances": images_array})
    headers = {
        "Content-Type": "application/json",
    }
    response = requests.post(        
        inference_url, 
        data=data,
        headers=headers, 
        cookies=cookies
    )
    response.raise_for_status()
    predictions = response.json()["predictions"]

    results_array = predictions

    return results_array

url = f'https://www.{domain}/kfserving/{namespace}/sklearn-iris:predict'

payload = {
    "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
    ]
}

infer(url, payload, cookies)

When I do this I get a 403 error.

Exception has occurred: HTTPError
403 Client Error: Forbidden for url: https://www.dev-kubeflow-120.learn-mlops.com/kfserving/karl-schriek/sklearn-iris:predict

The cookies are correct. Is there something I am missing or that I've configured incorrecty?

seizadi commented 3 years ago

I assume you tested the URL I had posted in the README to validate that your URL & cookies are set correctly? url = 'https://kubeflow.platform.example.com/pipeline/apis/v1beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter='

cookies = {'AWSELBAuthSessionCookie-0': 'xxxxxx' 'AWSELBAuthSessionCookie-1': 'xxxxxx' }

r = requests.get(url=url, cookies=cookies) result = r.json()

Assuming your requests are making their way through the ALB you can trace where along the packet path you are getting it rejected. You can reference this guide for the packet path: https://github.com/kubeflow/kfserving/blob/master/docs/KFSERVING_DEBUG_GUIDE.md#debug-kfserving-request-flow, assuming your rewrite rule is setup properly they should be arriving at the Kfserving Pod in the karl-schriek namespace.

Generally you can use the logs but I did use tcpdump when I was debugging, I don't think you will need it.

karlschriek commented 3 years ago

Yes I tested against that URL, the cookies are working. I'll follow your suggestion on tracing the packet path tomorrow. Thanks for the suggestion.

karlschriek commented 3 years ago

Did you roll out KFServing with KF or did you make a standalone rollout?

karlschriek commented 3 years ago

I wonder if the debug guide https://github.com/kubeflow/kfserving/blob/master/docs/KFSERVING_DEBUG_GUIDE.md#debug-kfserving-request-flow refers to an older version of KF/KFServing

I am on KF 1.2, KFServing 0.4.1

It suggests that running kubectl get gateway knative-ingress-gateway -n knative-serving -oyaml should yield a gateway that looks like this:

kind: Gateway
metadata:
  labels:
    networking.knative.dev/ingress-provider: istio
    serving.knative.dev/release: v0.12.1
  name: knative-ingress-gateway
  namespace: knative-serving
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

For me this gateway doesn't exist at all (neither when installing KFServing bundled with KF, nor with a standalone KFServing 0.4.1).

The only gateway I have is kubeflow/kubeflow-gateway, which I notice does not allow HTTPS traffic. It only has:

  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP

UPDATE: @seizadi, comparing what you have:

NAME                                   GATEWAYS                                                                          HOSTS                                                                                                                                                                                                      AGE
notebook-seizadi-kubecon-tutorial      [kubeflow/kubeflow-gateway]                                                       [*]                                                                                                                                                                                                        16d
notebook-seizadi-kubeflow-end-to-end   [kubeflow/kubeflow-gateway]                                                       [*]                                                                                                                                                                                                        6d16h
sklearn-iris                           [knative-ingress-gateway.knative-serving]                                         [sklearn-iris.seizadi.platform.sexample.com]                                                                                                                                                                2d19h
sklearn-iris-predictor-default         [knative-serving/cluster-local-gateway knative-serving/knative-ingress-gateway]   [sklearn-iris-predictor-default.seizadi sklearn-iris-predictor-default.seizadi.platform.sexample.com sklearn-iris-predictor-default.seizadi.svc sklearn-iris-predictor-default.seizadi.svc.cluster.local]   2d19h
sklearn-iris-predictor-default-mesh    [mesh]                                                                            [sklearn-iris-predictor-default.seizadi sklearn-iris-predictor-default.seizadi.svc sklearn-iris-predictor-default.seizadi.svc.cluster.local] 

With what I have:


NAME                                     GATEWAYS                                                            HOSTS                                                                                                                                                                                                                                      AGE
kfserving-iap                            [kubeflow/kubeflow-gateway]                                         [*]                                                                                                                                                                                                                                        117m
sklearn-iris                             [kubeflow-gateway.kubeflow knative-serving/cluster-local-gateway]   [sklearn-iris.karl-schriek.dev-kubeflow-120.learn-mlops.com sklearn-iris.karl-schriek.svc.cluster.local]                                                                                                                                   116m
sklearn-iris-predictor-default-ingress   [knative-serving/cluster-local-gateway kubeflow/kubeflow-gateway]   [sklearn-iris-predictor-default.karl-schriek sklearn-iris-predictor-default.karl-schriek.dev-kubeflow-120.learn-mlops.com sklearn-iris-predictor-default.karl-schriek.svc sklearn-iris-predictor-default.karl-schriek.svc.cluster.local]   116m
sklearn-iris-predictor-default-mesh      [mesh]                                                              [sklearn-iris-predictor-default.karl-schriek sklearn-iris-predictor-default.karl-schriek.svc sklearn-iris-predictor-default.karl-schriek.svc.cluster.local]                                                                                116m

There are differences in the gateways being used.

@yuzisun , @PatrickXYS do you have any ideas? Did something change in KF1.2 or KFServing 0.4.1 that leads to this difference? I am not entirely certain how to proceed with debugging this

UPDATE UPDATE:

Looking at the manifests in KF 1.2, when rolling out KFServing with KF 1.2, the following configmap patch is applied:

apiVersion: v1
kind: ConfigMap
metadata:
  name: inferenceservice-config
  namespace: kubeflow
data:
  ingress: |-
    {
        "ingressGateway" : "kubeflow-gateway.kubeflow",
        "ingressService" : "istio-ingressgateway.istio-system.svc.cluster.local"
    }

For the standalone rollout this is:

  ingress: |-
    {
        "ingressGateway" : "knative-ingress-gateway.knative-serving",
        "ingressService" : "istio-ingressgateway.istio-system.svc.cluster.local"
    }

but that gateway does not exists.

seizadi commented 3 years ago

I removed my deploy directory from 1.0.2 and created a new one with 1.1.x Everything on there Istio and Kfserving were created by Kubeflow deploy. I have found in the past difficult to make Kubeflow work with other pieces that are pre-installed on the cluster, so I have been having kubeflow manage the whole deployment.

About your question about the flow diagram, the Istio Ingress Gateway is part of kubeflow namespace and attached to the AWS ALB. You should also have Istio Local Gateway running in knative-serving:

❯ k get Gateway --all-namespaces NAMESPACE NAME AGE knative-serving cluster-local-gateway 30d kubeflow kubeflow-gateway 30d

karlschriek commented 3 years ago

Yep, those are the gateways I have running. I actually don't really think that the Gateway is the problem, but I need to understand where the rollouts differ. It could be a more fundamental change in KF 1.2 or it could be a configuration problem on my end. Will keep you posted, it would be great if we could create an end-to-end guide here that will be guaranteed to work.

karlschriek commented 3 years ago

The issue is with Istio 1.3.1, as it is currently not possible to perform Cookie-based auth. This also means that for AWS deployments, KFServing is currently restricted to version 0.3.0.

Once this issue has been resolved I will see about contributing to a how-to guide again! https://github.com/kubeflow/manifests/issues/1695

kserve-oss-bot commented 3 years ago

@seizadi: The label(s) kind/feature cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kserve/website/issues/40): >/kind feature > >**Describe the solution you'd like** >There is not a clear write up on how to configure AWS Cognito for Kfserving. The current >[kubeflow e2e](https://www.kubeflow.org/docs/aws/aws-e2e/) does not cover kfserving configuration. > >There is a good writeup for [GCloud IAP](https://github.com/kubeflow/kfserving/tree/master/docs/samples/gcp-iap#kfserving-gcpiap-example). > That guide could be used as a model for the AWS Cognito. This would go a long way to avoid having people struggle with the setup for kfserving on AWS. > >Recent issues on this: >https://github.com/kubeflow/kfserving/issues/1154 >https://github.com/kubeflow/website/issues/2378 > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.