Open seizadi opened 3 years ago
@seizadi do you have some initial pointers here on how to go about it? If I can manage to get it working for us I would be happy to contribute to a documentation afterwards
If I understood it correctly, all that is needed is to roll out an InferenceService with istio sidecar injection to allow for authorization (https://github.com/kubeflow/kfserving/blob/master/docs/samples/gcp-iap/sklearn-iap-with-authz.yaml) and then start up an additional VirtualService (https://github.com/kubeflow/kfserving/blob/master/docs/samples/gcp-iap/virtual-service.yaml).
We then post requests to the new VirtualService with these cookies:
cookies = {'AWSELBAuthSessionCookie-0': 'xxxxxx'
'AWSELBAuthSessionCookie-1': 'xxxxxx'
}
Seems fairly straightforward. Am I missing anything (aside from the fact that this need KF 1.1+)?
You actually need to disable the istio-sidecar, AuthZ does not work, depending on your requirements, you might not be able to use this in production.
Here is some detail notes I wrote to myself when I was debugging this problem: https://github.com/seizadi/kubeflow-aws-cognito#debug-model
/cc @PatrickXYS
@seizadi Thanks I'll help on this, will collaborate with @yuzisun to make sure we provide well-written documentation for KFServing on AWS Cognito
/assign
@seizadi I've gone through your notes and am trying to replicate your solution.
I am rolling out the following:
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
namespace: karl-schriek
annotations:
sidecar.istio.io/inject: "false"
spec:
default:
predictor:
sklearn:
storageUri: "gs://kfserving-samples/models/sklearn/iris"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: kfserving-iap
namespace: karl-schriek
spec:
gateways:
- kubeflow/kubeflow-gateway
hosts:
- '*'
http:
- match:
- uri:
prefix: /kfserving/karl-schriek/sklearn-iris
route:
- destination:
host: cluster-local-gateway.istio-system.svc.cluster.local
headers:
request:
set:
Host: sklearn-iris-predictor-default.karl-schriek.svc.cluster.local
weight: 100
rewrite:
uri: /v1/models/sklearn-iris
timeout: 300s
And am then sending a REST request with:
import json
import requests
import auth
namespace = "karl-schriek"
domain = "dev-kubeflow-120.learn-mlops.com"
# programmatically fetches cookies
cookies = auth.get_auth_cookie(
host=f"https://www.{domain}",
from_cache=False,
return_as_dict=True,
username="*******"
password="*******"
)
def infer(inference_url, images_array, cookies):
data = json.dumps({"instances": images_array})
headers = {
"Content-Type": "application/json",
}
response = requests.post(
inference_url,
data=data,
headers=headers,
cookies=cookies
)
response.raise_for_status()
predictions = response.json()["predictions"]
results_array = predictions
return results_array
url = f'https://www.{domain}/kfserving/{namespace}/sklearn-iris:predict'
payload = {
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
infer(url, payload, cookies)
When I do this I get a 403 error.
Exception has occurred: HTTPError
403 Client Error: Forbidden for url: https://www.dev-kubeflow-120.learn-mlops.com/kfserving/karl-schriek/sklearn-iris:predict
The cookies are correct. Is there something I am missing or that I've configured incorrecty?
I assume you tested the URL I had posted in the README to validate that your URL & cookies are set correctly? url = 'https://kubeflow.platform.example.com/pipeline/apis/v1beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter='
cookies = {'AWSELBAuthSessionCookie-0': 'xxxxxx' 'AWSELBAuthSessionCookie-1': 'xxxxxx' }
r = requests.get(url=url, cookies=cookies) result = r.json()
Assuming your requests are making their way through the ALB you can trace where along the packet path you are getting it rejected. You can reference this guide for the packet path: https://github.com/kubeflow/kfserving/blob/master/docs/KFSERVING_DEBUG_GUIDE.md#debug-kfserving-request-flow, assuming your rewrite rule is setup properly they should be arriving at the Kfserving Pod in the karl-schriek namespace.
Generally you can use the logs but I did use tcpdump when I was debugging, I don't think you will need it.
Yes I tested against that URL, the cookies are working. I'll follow your suggestion on tracing the packet path tomorrow. Thanks for the suggestion.
Did you roll out KFServing with KF or did you make a standalone rollout?
I wonder if the debug guide https://github.com/kubeflow/kfserving/blob/master/docs/KFSERVING_DEBUG_GUIDE.md#debug-kfserving-request-flow refers to an older version of KF/KFServing
I am on KF 1.2, KFServing 0.4.1
It suggests that running kubectl get gateway knative-ingress-gateway -n knative-serving -oyaml
should yield a gateway that looks like this:
kind: Gateway
metadata:
labels:
networking.knative.dev/ingress-provider: istio
serving.knative.dev/release: v0.12.1
name: knative-ingress-gateway
namespace: knative-serving
spec:
selector:
istio: ingressgateway
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
- hosts:
- '*'
port:
name: https
number: 443
protocol: HTTPS
tls:
mode: SIMPLE
privateKey: /etc/istio/ingressgateway-certs/tls.key
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
For me this gateway doesn't exist at all (neither when installing KFServing bundled with KF, nor with a standalone KFServing 0.4.1).
The only gateway I have is kubeflow/kubeflow-gateway
, which I notice does not allow HTTPS traffic. It only has:
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
UPDATE: @seizadi, comparing what you have:
NAME GATEWAYS HOSTS AGE
notebook-seizadi-kubecon-tutorial [kubeflow/kubeflow-gateway] [*] 16d
notebook-seizadi-kubeflow-end-to-end [kubeflow/kubeflow-gateway] [*] 6d16h
sklearn-iris [knative-ingress-gateway.knative-serving] [sklearn-iris.seizadi.platform.sexample.com] 2d19h
sklearn-iris-predictor-default [knative-serving/cluster-local-gateway knative-serving/knative-ingress-gateway] [sklearn-iris-predictor-default.seizadi sklearn-iris-predictor-default.seizadi.platform.sexample.com sklearn-iris-predictor-default.seizadi.svc sklearn-iris-predictor-default.seizadi.svc.cluster.local] 2d19h
sklearn-iris-predictor-default-mesh [mesh] [sklearn-iris-predictor-default.seizadi sklearn-iris-predictor-default.seizadi.svc sklearn-iris-predictor-default.seizadi.svc.cluster.local]
With what I have:
NAME GATEWAYS HOSTS AGE
kfserving-iap [kubeflow/kubeflow-gateway] [*] 117m
sklearn-iris [kubeflow-gateway.kubeflow knative-serving/cluster-local-gateway] [sklearn-iris.karl-schriek.dev-kubeflow-120.learn-mlops.com sklearn-iris.karl-schriek.svc.cluster.local] 116m
sklearn-iris-predictor-default-ingress [knative-serving/cluster-local-gateway kubeflow/kubeflow-gateway] [sklearn-iris-predictor-default.karl-schriek sklearn-iris-predictor-default.karl-schriek.dev-kubeflow-120.learn-mlops.com sklearn-iris-predictor-default.karl-schriek.svc sklearn-iris-predictor-default.karl-schriek.svc.cluster.local] 116m
sklearn-iris-predictor-default-mesh [mesh] [sklearn-iris-predictor-default.karl-schriek sklearn-iris-predictor-default.karl-schriek.svc sklearn-iris-predictor-default.karl-schriek.svc.cluster.local] 116m
There are differences in the gateways being used.
@yuzisun , @PatrickXYS do you have any ideas? Did something change in KF1.2 or KFServing 0.4.1 that leads to this difference? I am not entirely certain how to proceed with debugging this
UPDATE UPDATE:
Looking at the manifests in KF 1.2, when rolling out KFServing with KF 1.2, the following configmap patch is applied:
apiVersion: v1
kind: ConfigMap
metadata:
name: inferenceservice-config
namespace: kubeflow
data:
ingress: |-
{
"ingressGateway" : "kubeflow-gateway.kubeflow",
"ingressService" : "istio-ingressgateway.istio-system.svc.cluster.local"
}
For the standalone rollout this is:
ingress: |-
{
"ingressGateway" : "knative-ingress-gateway.knative-serving",
"ingressService" : "istio-ingressgateway.istio-system.svc.cluster.local"
}
but that gateway does not exists.
I removed my deploy directory from 1.0.2 and created a new one with 1.1.x Everything on there Istio and Kfserving were created by Kubeflow deploy. I have found in the past difficult to make Kubeflow work with other pieces that are pre-installed on the cluster, so I have been having kubeflow manage the whole deployment.
About your question about the flow diagram, the Istio Ingress Gateway is part of kubeflow namespace and attached to the AWS ALB. You should also have Istio Local Gateway running in knative-serving:
❯ k get Gateway --all-namespaces NAMESPACE NAME AGE knative-serving cluster-local-gateway 30d kubeflow kubeflow-gateway 30d
Yep, those are the gateways I have running. I actually don't really think that the Gateway is the problem, but I need to understand where the rollouts differ. It could be a more fundamental change in KF 1.2 or it could be a configuration problem on my end. Will keep you posted, it would be great if we could create an end-to-end guide here that will be guaranteed to work.
The issue is with Istio 1.3.1, as it is currently not possible to perform Cookie-based auth. This also means that for AWS deployments, KFServing is currently restricted to version 0.3.0.
Once this issue has been resolved I will see about contributing to a how-to guide again! https://github.com/kubeflow/manifests/issues/1695
@seizadi: The label(s) kind/feature
cannot be applied, because the repository doesn't have them.
/kind feature
Describe the solution you'd like There is not a clear write up on how to configure AWS Cognito for Kfserving. The current kubeflow e2e does not cover kfserving configuration.
There is a good writeup for GCloud IAP. That guide could be used as a model for the AWS Cognito. This would go a long way to avoid having people struggle with the setup for kfserving on AWS.
Recent issues on this: https://github.com/kubeflow/kfserving/issues/1154 https://github.com/kubeflow/website/issues/2378