kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.61k stars 1.63k forks source link

KFP sdk client authentication error #4569

Closed sudivate closed 2 years ago

sudivate commented 4 years ago

/kind bug

What steps did you take and what happened: Enabled authentication with Azure AD on AKS and installing Kubeflow with kfctl_istio_dex.v1.1.0.yaml but skipping the dex from the manifest as Azure AD is an OIDC provider. The load balancer is exposed over https with TLS 1.3 self-signed cert.

OIDC Auth Service Configuration:

Issue When using KFP client to upload the pipeline (client.pipeline_uploads.upload_pipeline()) with below client config throws an error.

client = kfp.Client(host='https://<LoadBalancer IP Address>/pipeline', existing_token=<token>)

Error HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /pipeline/apis/v1beta1/pipelines/upload?name=local_exp-6714175b-6d59-40d0-9019-5b4ee58dc483 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076)')))

Is there a way to override cert verification?

or

When using KFP client to upload the pipeline (client.pipeline_uploads.upload_pipeline()) with below client config redirects to google auth error.

client = kfp.Client(host='https://<LoadBalancer IP Address>/pipeline ,client_id=<client_id>, other_client_id=<client_id>,other_client_secret=<application_secret>,namespace='kfauth')

image

Environment:

CC: @Bobgy

Bobgy commented 4 years ago

Can you give an example of how you'd expect to send an API request to your endpoint using e.g. requests library. I think we can work from there and see how we can integrate that option with KFP client.

Right now, there are quite some technical debt in this area for supporting different auth methods. If interested, we'd also welcome discussion on how to structure this in a better way

sudivate commented 4 years ago

When I try to consume REST API directly using a Bearer access token generated with grant-type client credentials it still redirects to the authorization endpoint forcing iterative login.

url=r'https://host/pipeline/apis/v1beta1/pipelines' header = {'Authorization': 'Bearer ' + token} response = requests.get(url,headers=header,verify=False)

The above code redirects to the authorization endpoint

https://login.microsoftonline.com/<tenant_id>/v2.0/authorize?client_id=<client_id> &redirect_uri=https%3A%2F%2F<host>%2Flogin%2Foidc &response_type=code &scope=profile+email+openid &state=<xxxxx>

@yanniszark what type of token can I use for REST API to skip interactive login? If that works we can have optional SSL verification on the client.

sudivate commented 4 years ago

@Bobgy OIDC auth service supports only Authorization code flow which is mostly used in browser-based interactive login. In order to consume client SDK, we will have to enable client credential flow on the auth service. This will allow non-interacitve login and enable programmatic access to all API.

Bobgy commented 4 years ago

@sudivate I think you can decide how you want to configure Kubeflow endpoint auth for Azure. This isn't a decision KFP needs to make. Once you configured the auth as you like, we welcome contribution to let KFP sdk work with it.

berndverst commented 3 years ago

@sudivate seems like you found a workaround by manually hijacking the browser cookie and passing that into the KFP client.

https://www.kubeflow.org/docs/azure/authentication-oidc/#authenticate-kubeflow-pipelines-using-kubeflow-pipelines-sdkhttpswwwkubefloworgdocspipelinessdksdk-overview

If you are not looking into any other changes related to this, do you want to close this issue for now?

Junaid-Ahmed94 commented 3 years ago

I am also facing a similar issue for AKS following are the details

Error MaxRetryError: HTTPSConnectionPool(host='x.x.x.x', port=443): Max retries exceeded with url: /apis/v1beta1/healthz (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))

​KF Details

  1. KF-1.2 with OIDC kfctl_azure_aad.v1.2.0.yaml
  2. Updated pipeline to be multiuser

I can run the test pipeline but when try to create pipeline from kfp client it gives this error.

berndverst commented 3 years ago

@Junaid-Ahmed94 only Authorization Code Flow is supported in this scenario. https://openid.net/specs/openid-connect-basic-1_0.html#CodeFlow

https://www.kubeflow.org/docs/azure/authentication-oidc/#authenticate-kubeflow-pipelines-using-kubeflow-pipelines-sdkhttpswwwkubefloworgdocspipelinessdksdk-overview

Junaid-Ahmed94 commented 3 years ago

@berndverst I am following the documented, but I believe the issue is created by the self signed certificate. A curl command showed the error much clearer.

curl -H "X-Auth-Token: <Session_Cookie>" "https://xx.xx.xx.xx/pipeline/". But setting -k flag returns results as desired. I will test with a proper certificate and then will update here with the outcome

image

philwinder commented 3 years ago

I agree with the others here. This isn't an AKS or cloud issue. The Kubeflow docs instruct you to use certmanager to create a self signed certificate. But obviously browsers and curl can't verify the identity, so you just ignore/suppress that.

The issue is that the kfp.Client class doesn't allow you to pass verify=false through to the underlying requests library, so you can't ignore the non-verifiable certificate. And therefore you can't use kfp.Client on clusters that have been setup following the standard KF docs.

Junaid-Ahmed94 commented 3 years ago

I was able to solve this by using a signed certificate ,not self signed but authority signed (you can use letsencrypt),

  1. Have a authority signed certificate / letsencrpyt can also work.
  2. Delete the istio-ingressgateway-certs secret and create a new secret with the keys from the new certificate
  3. Skip the step of creating a certificate for kubeflow (step 3 in https://www.kubeflow.org/docs/azure/authentication-oidc/#expose-kubeflow-securely-over-https)
  4. If you are still not able to use the KFP than simply provide the root.cert in the KFP call, its a part of kfp.client function.

I used the above approach and it worked for me.

pablofiumara commented 3 years ago

Hi,

I am trying to make the following code work. My goal is to get a KFP client working with Azure. When I execute that, I get the error __urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='104.42.221.31', port=443): Max retries exceeded with url: /pipeline/apis/v1beta1/healthz (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)'),))__ I tried following steps here https://www.kubeflow.org/docs/distributions/azure/authentication-oidc/ . When I execute that code again, same error happens. When I try to log in to Kubeflow dashboard, error Missing url parameter: code appears. What am I missing?

import argparse

import kfp
import adal

def get_access_token(tenant, clientId, client_secret):
    authorityHostUrl = "https://login.microsoftonline.com"
    GRAPH_RESOURCE = "00000002-0000-0000-c000-000000000000"

    authority_url = authorityHostUrl + "/" + tenant

    context = adal.AuthenticationContext(authority_url)
    token = context.acquire_token_with_client_credentials(
        GRAPH_RESOURCE, clientId, client_secret
    )  # noqa: E501
    return token["accessToken"]

def main():

    parser = argparse.ArgumentParser("run pipeline")

    parser.add_argument(
        "--kfp_host",
        type=str,
        required=True,
        help="KFP endpoint",
    )

    parser.add_argument("--tenant", type=str, required=True, help="Tenant")

    parser.add_argument(
        "--service_principal", type=str, required=True, help="Service Principal"
    )

    parser.add_argument(
        "--sp_secret", type=str, required=True, help="Service Principal Secret"
    )

    args = parser.parse_args()
    token = get_access_token(
        args.tenant, args.service_principal, args.sp_secret
    )
    client = kfp.Client(host=args.kfp_host, existing_token=token)
    pipelines = client.list_pipelines()
    print(pipelines)

if __name__ == "__main__":
    main()
berndverst commented 3 years ago

@pablofiumara that sounds like the issue others are talking about -- the Kubeflow Client can't verify the self signed certificate on the server. You need to replace these certs like @Junaid-Ahmed94 said.

pablofiumara-azumo commented 3 years ago

@berndverst Thanks. It seems Kubeflow 1.3 on Azure + AAD is in progress (here's a screenshot https://ibb.co/5FmP7bF). Is that correct?

@Junaid-Ahmed94 Can you give us some details about points 1, 2 and 3, please?

Regarding point 1, I am trying this: https://github.com/mspnp/letsencrypt-pip-cert-generation/blob/main/README.md

Did you do something like that? Thanks in advance

sudivate commented 3 years ago

Not sure if this is still valid to current version of KF manifest. I tried enabling access token based authentication for KFP client. Please feel free to leverage this work if it applies to your scenario.

https://github.com/arrikto/oidc-authservice/issues/46

pablofiumara commented 3 years ago

@sudivate Thank you. I will take a look

pablofiumara commented 3 years ago

I would like to add more information while I take a look at that. The only approach that let me use Kubeflow client with AAD was the following: https://github.com/kaizentm/kubemlops/blob/master/docs/Kubeflow-install.md#option-1-install-standalone-kubeflow-pipelines

Problem is I need a full Kubeflow installation working on Azure (not just Kubeflow pipelines)

Junaid-Ahmed94 commented 3 years ago

@pablofiumara I did tried lets-encrypt but in favor to save time and get to production I asked out IT team to provide me a valid certificate, But as I mentioned in my earlier comment lets-encrypt should also achieve the same. In the end, we just want some authority signed certificate to provide TLS security for our website/URL-Address.

The site you mentioned, I took a look into that and it seems like doing what we need. But you can also use kubeflow provided kustomize manifests for letsencrypt as well. https://github.com/kubeflow/manifests/tree/v1.2-branch/cert-manager/cert-manager/overlays/letsencrypt

You just have to be sure that you are updating the things in proper places to actually make use of this. e.g. one such place is https://github.com/kubeflow/manifests/blob/v1.2-branch/stacks/azure/application/cert-manager/kustomization.yaml#L11 here you can see it is pointing to self-signed certificate you have to update this to point to letsencrypt manifests. There are few more places, but hardly 2 or 3 where the changes need to be made to make use of letsencrypt

Junaid-Ahmed94 commented 3 years ago

But I will suggest you to move to kubeflow 1.3 if possible. You will get 2 major benefits straight away

  1. Kubeflow 1.2 is still using istio 1.3 and it has issues with multi-tenancy Bernd Verst had to come up with 2 different manifests to run in multi-tenancy and single user mode. I recently deployed kubeflow 1.3 and this is not the case any more there. Correct me if there are still some things missing @berndverst
  2. Kubeflow 1.3 has some major upgrades in terms for making day to day tasks easier for Data Science processes like tracking model development with tensor-board and many more (I still have to go through the documentations)
pablofiumara commented 3 years ago

@Junaid-Ahmed94 Thank you. I will try that and keep you all updated

pablofiumara commented 3 years ago

@Junaid-Ahmed94 Your suggestion worked, thank you. A secret named letsencrypt-prod-secret and a clusterissuer named letsencrypt-prod were created. However, SSL does not work yet. I think it's because something different needs to be done compared to point 1 from here: https://www.kubeflow.org/docs/distributions/azure/authentication-oidc/#expose-kubeflow-securely-over-https

Is that correct? If so, how shoud I reference letsencrypt-prod-secret from the gateway?

Thanks in advance

pablofiumara commented 3 years ago

I changed https://github.com/berndverst/manifests/blob/v1.3-branch/distributions/stacks/azure/kustomization.yaml#L10 and https://github.com/berndverst/manifests/blob/v1.3-branch/common/cert-manager/cert-manager/overlays/letsencrypt/params.env#L1

Junaid-Ahmed94 commented 3 years ago

@pablofiumara you should also change the cluster issuer name in your certificate.yaml file. Have you updated this too to point to the new cluster issuer ?

issuerRef:
        kind: ClusterIssuer
        name: kubeflow-self-signing-issuer
pablofiumara commented 3 years ago

@Junaid-Ahmed94 Thanks for your fast response. Yes, I did. Here's my certificate:

cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: istio-ingressgateway-certs
  namespace: istio-system
spec:
  commonName: istio-ingressgateway.istio-system.svc
  ipAddresses:
  - myIpAddress
  isCA: true
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-prod
  secretName: letsencrypt-prod-secret
EOF
pablofiumara commented 3 years ago

istio-ingressgateway pod can't start. Here's the log:

Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected

I googled that error but I still can't understand how to solve all this. Deleting the pod didn't help

pablofiumara commented 3 years ago

What should I write below tls? I don't think those two paths at the end are correct if we use Let'sEncrypt


spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
    # Upgrade HTTP to HTTPS
    tls:
      httpsRedirect: true
  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt```
pablofiumara commented 3 years ago

@Junaid-Ahmed94 @sudivate Suppose I have a valid SSL certificate. Then I create a Kubernetes secret for it. How can I attach that to a Kubeflow dashboard on Azure?

Junaid-Ahmed94 commented 3 years ago

@pablofiumara the certificate https://github.com/kubeflow/pipelines/issues/4569#issuecomment-850628491 looks fine before applying the certificate https://github.com/kubeflow/pipelines/issues/4569#issuecomment-850633203 did you removed the already existing certificate ? And are you still getting the same error https://github.com/kubeflow/pipelines/issues/4569#issuecomment-841340473 ?

pablofiumara commented 3 years ago

@Junaid-Ahmed94 Thanks for your answer. I tried removing the already existing certificate (Kubernetes secret) but it was generated automatically again. I don't know if I am still getting the same error because first I would like to have SSL working.

Right now I am getting "This site can’t be reached" when I try to log into Kubeflow dashboard. Seems to be a problem related with AAD. It can't get into https://oneAzureIp/login/oidc?code=oneCode&session_state=oneState

This is my gateway right now. What should I do?

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"Gateway","metadata":{"annotations":{},"name":"kubeflow-gateway","namespace":"kubeflow"},"spec":{"selector":{"istio":"ingressgateway"},"servers":[{"hosts":["*"],"port":{"name":"http","number":80,"protocol":"HTTP"}}]}}
  creationTimestamp: "2021-06-01T15:59:12Z"
  generation: 15
  name: kubeflow-gateway
  namespace: kubeflow
  resourceVersion: "140031"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/kubeflow/gateways/kubeflow-gateway
  uid: 3fead9d1-d722-4f68-b1d0-9dcbd0b670c3
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - kubeflowDashboardIp
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: ingress-cert
      mode: SIMPLE
Junaid-Ahmed94 commented 3 years ago

Where did you get this credentialName ? I am assuming you created this yourself following something like this documentation https://istio.io/latest/docs/tasks/traffic-management/ingress/secure-ingress/#configure-a-tls-ingress-gateway-for-multiple-hosts ?

Can you try the following and be sure that the steps were executed in order?

  1. Remove all types of cluster issuers (self-signed and lets encrypt)
  2. Remove both the secrets letsencrypt-prod-secret and istio-ingressgateway-certs
  3. Reapply the manifests, this should only create the lets-encrypt cluster issuer.
  4. Do not apply the certificate.yaml file. Secret should have been created already in the upper step
  5. Update the gateway file, https://github.com/kubeflow/pipelines/issues/4569#issuecomment-850633203 this should be fine

    Step 3 and 4 can be skipped if lets-encrypt is not used, and you have created your certificate yourself. For such certificate you have to create the secret yourself

Once you have successfully executed the above, hopefully things should work fine. Then the error you might face with KFP might be client/server certificate authentication with needs root certificate. This i have already mentioned in my previous comment. https://github.com/kubeflow/pipelines/issues/4569#issuecomment-805787819

pablofiumara commented 3 years ago

@Junaid-Ahmed94 Thanks again.

I got credentialName from here (point 3): https://istio.io/latest/docs/tasks/traffic-management/ingress/secure-ingress/#configure-a-tls-ingress-gateway-for-a-single-host

After doing every step, this error appears after executing kubectl logs istio-ingressgateway-XXXXXXXXX -n istio-system

2021-06-02T13:18:33.129866Z warning envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 0.0.0.0_8443: Invalid path: /etc/istio/ingressgateway-certs/tls.crt

pablofiumara commented 3 years ago

@DavidSpek Do you know about this?

davidspek commented 3 years ago

@pablofiumara Just to be clear, is this about the Istio setup and how to handle certificates for Kubeflow?

pablofiumara commented 3 years ago

@DavidSpek Thanks for your answer. Yes, that's right. The goal is to have SSL (https, using Let's Encrypt certificate) working for Kubeflow dashboard on Azure. Kubeflow dashboard Ip address ends with dns.westus.cloudapp.azure.com

davidspek commented 3 years ago

For ArgoFlow we are using the Istio Operator to install Istio, and I've re-implemented authentication to improve security. Part of that is done by replacing the OIDC Authservice with Oauth2-Proxy which is actively maintained by a large community and thus should be easier to setup with any number of providers (see here), which kind of makes Dex irrelevant (except for LDAP, which can also be done with Keycloak, I've integrated both).

Specifically for Azure we've only just started looking at it today, so I don't have the integrations for loadbalancers and Azure DNS implemented yet, but you can take a look at ArgoFlow-Azure if you're interested or want to help out.

The Istio spec we use for this auth setup is the following:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: istio
spec:
  profile: default
  tag: 1.10.0 # istio/operator
  hub: docker.io/istio
  meshConfig:
    accessLogFile: /dev/stdout
    enablePrometheusMerge: true
    extensionProviders: 
    - name: "oauth2-proxy"
      envoyExtAuthzHttp:
        service: "oauth2-proxy.auth.svc.cluster.local"
        port: "4180" # The default port used by oauth2-proxy.
        #includeHeadersInCheck: ["authorization", "cookie"]  # headers sent to the oauth2-proxy in the check request.
        includeHeadersInCheck: # headers sent to the oauth2-proxy in the check request.
            # https://github.com/oauth2-proxy/oauth2-proxy/issues/350#issuecomment-576949334
            - "cookie"
            - "x-forwarded-access-token"
            - "x-forwarded-user"
            - "x-forwarded-email"
            - "authorization"
            - "x-forwarded-proto"
            - "proxy-authorization"
            - "user-agent"
            - "x-forwarded-host"
            - "from"
            - "x-forwarded-for"
            - "accept"
        headersToUpstreamOnAllow: ["authorization", "path", "x-auth-request-user", "x-auth-request-email", "x-auth-request-access-token", "x-auth-request-user-groups"] # headers sent to backend application when request is allowed.
        headersToDownstreamOnDeny: ["content-type", "set-cookie"] # headers sent back to the client when request is denied.

The AuthorizationPolicy that makes use of this is the following:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: istio-ingressgateway
  namespace: istio-system
spec:
  action: CUSTOM
  selector:
    # Same as the istio-ingressgateway Service selector
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway
  provider:
    name: "oauth2-proxy"
  rules:
  - to:
    - operation:
        hosts:
        - <<__subdomain_dashboard__>>.<<__domain__>>
        - <<__subdomain_serving__>>.<<__domain__>>

The kubeflow gateway that is used is:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: kubeflow-gateway
  namespace: kubeflow
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - <<__subdomain_dashboard__>>.<<__domain__>>
    port:
      name: http
      number: 80
      protocol: HTTP
    # Upgrade HTTP to HTTPS
    tls:
      httpsRedirect: true
  - hosts:
    - <<__subdomain_dashboard__>>.<<__domain__>>
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: kubeflow-ingressgateway-certs

And finally the certificate for cert-manager that is used is:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kubeflow-ingressgateway-certs
  namespace: istio-system
spec:
  secretName: kubeflow-ingressgateway-certs
  issuerRef:
    name: gateways-issuer
    kind: ClusterIssuer
  commonName: <<__subdomain_dashboard__>>.<<__domain__>>
  dnsNames:
    - <<__subdomain_dashboard__>>.<<__domain__>>

There are a few more manifests needed to get a fully working setup, but I'm not sure they are very relevant for this conversation. One important thing to note is that the setup mentioned in https://github.com/kubeflow/pipelines/issues/4569#issuecomment-850633203 should not be used, as that doesn't allow you to define a separate certificate for each Istio gateway. Another thing to be aware of is that you cannot use the same certificate secret for 2 gateways in Istio as this will result in a 404 error.

The placeholder values you see are used by the setup script in the repo, which does a find and replace using a setup.conf file. The idea is that you for the repo, add your values in the setup.conf file, run the script, commit and push the changes to your fork and then install everything with Argo CD which then points to your repo. This makes updates for all components easy (and automated with Renovate), allows version tracking for changes in the manifests and avoid configuration drift.

That is probably a lot more information than you needed, but hopefully it helps. If you're still having problems or if you have other questions, you can always ping me or reach out to me on Slack. If anybody is interested in helping out with ArgoFlow-Azure that is always much appreciated. The AWS version should see a first stable release soon, so the work from there most likely just needs some porting to Azure.

pablofiumara commented 3 years ago

Thanks

pablofiumara commented 3 years ago

@eedorenko @sudivate @jotaylo Is this possible on Azure? If so, can you send me some documentation, please? https://github.com/kubeflow/pipelines/issues/4569#issuecomment-853309063

davidspek commented 3 years ago

@pablofiumara The Kubeflow gateway and certificate setup is definitely possible on Azure, that will work anywhere. The Isito installation through the operator will also work, but it will probably need some extra configs to play nicely with the Azure loadbalancer, this was also the case for AWS and shouldn't be difficult to implement (couple annotations on the gateway and another 2 yaml files). I don't have access to Azure, but if you like I can help you debug this setup over Slack as it is something I need to get working regardless. The ArgoFlow-Azure setup will potentially start being used by a fairly large entity that is running Kubeflow on Azure, so this would also need to be implemented for them.

pablofiumara commented 3 years ago

@DavidSpek Thank you very much. Can we take a look together about how to set up SSL (using Let's Encrypt) on Azure for Kubeflow dashboard, please? If so, how can I contact you on Slack?

davidspek commented 3 years ago

@pablofiumara Yeah for sure. If you are part of the Kubeflow slack you can find me as either DavidSpek or David van der Spek.

pablofiumara commented 3 years ago

@DavidSpek Thanks. I have just sent you a message on Slack

pwzhong commented 2 years ago

@pablofiumara Have you successfully set up SSL (using Let's Encrypt) on Azure? I am facing the same problem.

pablofiumara commented 2 years ago

@pwzhong Yes. I bought a domain on Azure and then installed this https://github.com/argoflow/argoflow-azure (cc @DavidSpek is the owner of that repository) Instructions https://github.com/kubeflow/kubeflow/issues/5976#issuecomment-861650631

nelson2000 commented 1 year ago

hello guys, I am facing two challenges, i cant seem to change my OIDC provider from dex to AAD I have tried everything, and i also tried to set you TLS using lets encrypt, that didn't work. I couldn't fulfill the solver even though I have a fully functional domain name from AWS Route53 and connected it to the dns zone in azure.