shikanon / kubeflow-manifests

kubeflow国内一键安装文件
GNU General Public License v3.0
337 stars 117 forks source link

authservice-0 not ready导致403报错 #73

Open TaibiaoGuo opened 2 years ago

TaibiaoGuo commented 2 years ago

问题简述

您好,我使用 kubeflow 官方manifests 和您构建的manifests时 authservice-0 都出现了同样的 not ready的问题kubeflow/manifests/issue ,这导致我无法访问kubeflow 面板。

2021-10-24T03:21:48.287901796+08:00 time="2021-10-23T19:21:48Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: EOF"

请问出现这种问题的原因是什么?方便分享一下您是如何配置 dex\istio 来实现 HTTP 访问所有服务的思路和原理呢?

官方 manifests 说明文档中使用HTTP访问所有服务的方法为需要手动设置环境变量,参见 连接kf集群

我已经尝试了的方法

我的Kubernetes环境

无公网IP的 Kubernetes v1.20 集群

集群状态

/ # kubectl get pod -A
# kubectl get pod -A
NAMESPACE                      NAME                                                              READY   STATUS             RESTARTS   AGE
auth                           dex-6d8cd4fccb-s4clw                                              1/1     Running            0          12m
cert-manager                   cert-manager-649f8dfd4b-86qx2                                     1/1     Running            0          45m
cert-manager                   cert-manager-cainjector-75cd8bbf6d-hq6w2                          1/1     Running            0          45m
cert-manager                   cert-manager-webhook-5b5cd9bd6f-c7gtk                             1/1     Running            0          45m
ingress-nginx                  ingress-nginx-admission-create-lqqlk                              0/1     Completed          0          4d5h
ingress-nginx                  ingress-nginx-admission-patch-w9sbl                               0/1     Completed          0          4d5h
ingress-nginx                  ingress-nginx-controller-686f6b6867-bzztx                         1/1     Running            0          3d22h
istio-system                   authservice-0                                                     0/1     Running            0          29m
istio-system                   cluster-local-gateway-74d9fd9586-kxhjz                            1/1     Running            0          28m
istio-system                   istio-ingressgateway-8bf685655-mgrf7                              1/1     Running            0          28m
istio-system                   istiod-756554b96b-6slfc                                           1/1     Running            0          28m
knative-eventing               broker-controller-cfb5ccb77-dp4b7                                 1/1     Running            0          44m
knative-eventing               eventing-controller-8657cd4b8-sfh9t                               1/1     Running            0          44m
knative-eventing               eventing-webhook-67f86f4d4d-wl49w                                 1/1     Running            0          44m
knative-eventing               imc-controller-68bd666784-rkvpk                                   1/1     Running            0          44m
knative-eventing               imc-dispatcher-78ff9dd847-7kmmp                                   1/1     Running            0          44m
knative-serving                activator-54b777546f-6v7x9                                        0/1     CrashLoopBackOff   15         44m
knative-serving                autoscaler-79bbc84d47-9g2b4                                       1/1     Running            0          44m
knative-serving                controller-dd65cb4b7-88m86                                        1/1     Running            0          44m
knative-serving                istio-webhook-5f545fc44b-zlmrb                                    1/1     Running            0          44m
knative-serving                networking-istio-6b6df495d6-zkjgj                                 1/1     Running            0          44m
knative-serving                webhook-9ff656f95-bd2fh                                           1/1     Running            0          44m
kube-system                    coredns-8496bbfb78-52c27                                          1/1     Running            0          4d5h
kube-system                    coredns-8496bbfb78-ngp9h                                          1/1     Running            0          4d5h
kube-system                    default-http-backend-6946487d9b-9s5sp                             1/1     Running            0          4d5h
kube-system                    etcd-k8s-master-node1                                             1/1     Running            0          4d6h
kube-system                    etcd-snapshot-1634651059-86qzb                                    0/1     Completed          0          4d5h
kube-system                    etcd-snapshot-1634961600-drknl                                    0/1     Completed          0          15h
kube-system                    etcd-snapshot-1634983200-r48jf                                    0/1     Completed          0          9h
kube-system                    etcd-snapshot-1635004800-8chsx                                    0/1     Completed          0          3h29m
kube-system                    kube-apiserver-k8s-master-node1                                   1/1     Running            0          4d6h
kube-system                    kube-controller-manager-k8s-master-node1                          1/1     Running            0          4d6h
kube-system                    kube-flannel-ds-jp2w5                                             1/1     Running            0          4d6h
kube-system                    kube-flannel-ds-jtdxd                                             1/1     Running            0          4d6h
kube-system                    kube-flannel-ds-r5n9v                                             1/1     Running            0          4d6h
kube-system                    kube-flannel-ds-t29zs                                             1/1     Running            0          4d6h
kube-system                    kube-flannel-ds-xl42q                                             1/1     Running            0          4d6h
kube-system                    kube-proxy-jdvsj                                                  1/1     Running            0          3d22h
kube-system                    kube-proxy-jlzgp                                                  1/1     Running            0          3d22h
kube-system                    kube-proxy-nvj9n                                                  1/1     Running            0          3d22h
kube-system                    kube-proxy-qmg4d                                                  1/1     Running            0          3d22h
kube-system                    kube-proxy-tbvj9                                                  1/1     Running            0          3d22h
kube-system                    kube-scheduler-k8s-master-node1                                   1/1     Running            0          4d6h
kube-system                    metrics-server-57bcd9bccd-cd24c                                   1/1     Running            0          4d
kube-system                    snapshot-controller-0                                             1/1     Running            0          4d
kubeflow-user-example-com      ml-pipeline-ui-artifact-6b9bb7f495-5vtnw                          2/2     Running            0          12m
kubeflow-user-example-com      ml-pipeline-visualizationserver-5c648f8448-jdqll                  2/2     Running            0          12m
kubeflow                       admission-webhook-deployment-5f5cc7968b-hjqkc                     1/1     Running            0          38m
kubeflow                       cache-deployer-deployment-64598b6c87-h9xz6                        2/2     Running            1          39m
kubeflow                       cache-server-59d67c7584-9gbkd                                     2/2     Running            0          25m
kubeflow                       centraldashboard-7b6b6cc7fc-g86hd                                 1/1     Running            0          38m
kubeflow                       jupyter-web-app-deployment-7c6974bb88-djnch                       1/1     Running            0          25m
kubeflow                       katib-controller-7b784c44dd-9z6qp                                 1/1     Running            0          38m
kubeflow                       katib-db-manager-6c5757dc64-8z45w                                 1/1     Running            0          38m
kubeflow                       katib-mysql-79d75c7444-q7xj4                                      1/1     Running            0          38m
kubeflow                       katib-ui-69f5b6795d-6xtth                                         1/1     Running            0          38m
kubeflow                       kfserving-controller-manager-0                                    2/2     Running            0          39m
kubeflow                       kubeflow-pipelines-profile-controller-76c45c8c6b-tfzjn            1/1     Running            0          25m
kubeflow                       metacontroller-0                                                  1/1     Running            0          39m
kubeflow                       metadata-envoy-deployment-56f745f7fb-xpgj9                        1/1     Running            0          39m
kubeflow                       metadata-grpc-deployment-6494577fdb-rrdjw                         2/2     Running            2          39m
kubeflow                       metadata-writer-b7ff9787-rglsg                                    2/2     Running            0          39m
kubeflow                       minio-cc8f7c6d-r6m2g                                              2/2     Running            0          25m
kubeflow                       ml-pipeline-66bcb9d79d-nfxkt                                      2/2     Running            0          39m
kubeflow                       ml-pipeline-persistenceagent-7fb8f6dc68-pzmdq                     2/2     Running            0          39m
kubeflow                       ml-pipeline-scheduledworkflow-64bcfd6596-h57hp                    2/2     Running            0          39m
kubeflow                       ml-pipeline-ui-8578f6685f-2mmnq                                   2/2     Running            0          38m
kubeflow                       ml-pipeline-viewer-crd-565fb9b5c5-cf9sc                           2/2     Running            1          38m
kubeflow                       ml-pipeline-visualizationserver-b7c7d49fb-6vvrr                   2/2     Running            0          38m
kubeflow                       mpi-operator-794849c566-5dssr                                     1/1     Running            0          38m
kubeflow                       mxnet-operator-6668d797d4-lk7m7                                   1/1     Running            0          38m
kubeflow                       mysql-c8d548489-j24z2                                             2/2     Running            0          25m
kubeflow                       notebook-controller-deployment-6795dd887b-95wlk                   1/1     Running            0          38m
kubeflow                       profiles-deployment-84bd4f9bc7-lq2nk                              2/2     Running            0          38m
kubeflow                       pytorch-operator-6887749499-p2rvr                                 2/2     Running            0          38m
kubeflow                       tensorboard-controller-controller-manager-dd896c8df-xn2bj         3/3     Running            1          38m
kubeflow                       tensorboards-web-app-deployment-5969cd5b68-6khtv                  1/1     Running            0          25m
kubeflow                       tf-job-operator-ccb48b77b-rbsgz                                   1/1     Running            0          38m
kubeflow                       volumes-web-app-deployment-867dfb5b5c-lnxfm                       1/1     Running            0          25m
kubeflow                       workflow-controller-6885c56f65-fjwh5                              2/2     Running            1          25m
kubeflow                       xgboost-operator-deployment-665cf9bf8d-gw4cv                      2/2     Running            2          38m

使用Kubernets的DNS调试工具对coreDNS插件进行调试,结果显示DNS运行正常。

$ kubectl exec -i -t dnsutils -- nslookup  dex.auth
Server:     10.96.0.10
Address:    10.96.0.10#53

Name:   dex.auth.svc.cluster.local
Address: 10.96.213.43

$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[INFO] 10.244.3.22:47994 - 27317 "AAAA IN metadata-grpc-service.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000028399s
[INFO] 10.244.3.22:47994 - 36217 "AAAA IN metadata-grpc-service.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000024183s
[INFO] 10.244.3.22:47994 - 10155 "AAAA IN metadata-grpc-service.mydomain. udp 48 false 512" NOERROR qr,aa,rd,ra 48 0.000416245s
[INFO] 10.244.3.22:47994 - 31310 "AAAA IN metadata-grpc-service.otherdomain. udp 51 false 512" NOERROR qr,aa,rd,ra 51 0.000266715s
[INFO] 10.244.3.22:47994 - 49256 "AAAA IN metadata-grpc-service. udp 39 false 512" NOERROR qr,aa,rd,ra 39 0.000290465s
[INFO] 10.244.3.22:47994 - 36740 "A IN metadata-grpc-service.kubeflow.svc.cluster.local. udp 66 false 512" NOERROR qr,aa,rd 130 0.000029962s
[INFO] 10.244.2.15:46399 - 61341 "AAAA IN dex.auth.svc.cluster.local.istio-system.svc.cluster.local. udp 75 false 512" NXDOMAIN qr,aa,rd 168 0.000213123s
[INFO] 10.244.2.15:58084 - 42770 "AAAA IN dex.auth.svc.cluster.local.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000366989s
[INFO] 10.244.2.15:38095 - 56024 "AAAA IN dex.auth.svc.cluster.local.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.00015846s
[INFO] 10.244.2.15:46342 - 61765 "A IN dex.auth.svc.cluster.local.mydomain. udp 53 false 512" NOERROR qr,aa,rd,ra 104 0.000732533s
[INFO] 10.244.2.15:59385 - 40897 "A IN dex.auth.svc.cluster.local.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000132883s
[INFO] 10.244.2.15:33994 - 15480 "A IN dex.auth.svc.cluster.local.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.000108848s
[INFO] 10.244.2.15:45563 - 11457 "AAAA IN dex.auth.svc.cluster.local.mydomain. udp 53 false 512" NOERROR qr,aa,rd,ra 53 0.000486252s
[INFO] 10.244.3.22:58580 - 55075 "AAAA IN metadata-grpc-service.kubeflow.svc.cluster.local. udp 66 false 512" NOERROR qr,aa,rd 159 0.00012731s
[INFO] 10.244.3.22:58580 - 42214 "AAAA IN metadata-grpc-service.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000127127s
[INFO] 10.244.3.22:58580 - 41511 "AAAA IN metadata-grpc-service.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000087901s
[INFO] 10.244.3.22:58580 - 48277 "AAAA IN metadata-grpc-service.mydomain. udp 48 false 512" NOERROR qr,aa,rd,ra 48 0.000446359s
[INFO] 10.244.3.22:58580 - 49236 "AAAA IN metadata-grpc-service.otherdomain. udp 51 false 512" NOERROR qr,aa,rd,ra 51 0.000329652s
[INFO] 10.244.3.22:58580 - 19522 "AAAA IN metadata-grpc-service. udp 39 false 512" NOERROR qr,aa,rd,ra 39 0.000242273s
[INFO] 10.244.3.22:58580 - 34650 "A IN metadata-grpc-service.kubeflow.svc.cluster.local. udp 66 false 512" NOERROR qr,aa,rd 130 0.000101973s
.....
shikanon commented 2 years ago

@TaibiaoGuo 看你的kubectl get pod -A 的输出结果, auth 是 running的,出问题的应该是knative 中 activator 这个服务,如果你用我的 manifest 配合 kind 安装,只需要按照 readme 访问 istio svc 的node port端口。 dex 的鉴权是 overload 在 istio 的,可以看这个文件: https://github.com/shikanon/kubeflow-manifests/blob/master/manifest1.3/008-dex-overlays-istio.yaml

tianya092 commented 2 years ago

我也是所有的服务都是running 就activator 和authservice 是no ready状态,查看了一下日志,分别为 1 。Websocket connection could not be established {"level":"info","ts":"2021-11-16T08:06:34.781Z","logger":"activator","caller":"metrics/prometheus_exporter.go:37","msg":"Created Opencensus Prometheus exporter with config: &{knative.dev/internal/serving activator prometheus 5000000000 false 9090 false { false}}. Start the server for Prometheus exporter.","commit":"bcda051","knative.dev/controller":"activator","knative.dev/pod":"activator-75696c8c9-pqtkg"} {"level":"info","ts":"2021-11-16T08:06:34.781Z","logger":"activator","caller":"metrics/exporter.go:151","msg":"Successfully updated the metrics exporter; old config: ; new config &{knative.dev/internal/serving activator prometheus 5000000000 false 9090 false { false}}","commit":"bcda051","knative.dev/controller":"activator","knative.dev/pod":"activator-75696c8c9-pqtkg"} {"level":"info","ts":"2021-11-16T08:06:34.781Z","logger":"activator","caller":"activator/request_log.go:36","msg":"Updated the request log template.","commit":"bcda051","knative.dev/controller":"activator","knative.dev/pod":"activator-75696c8c9-pqtkg","template":""} {"level":"error","ts":"2021-11-16T08:06:37.764Z","logger":"activator","caller":"websocket/connection.go:116","msg":"Websocket connection could not be established","commit":"bcda051","knative.dev/controller":"activator","knative.dev/pod":"activator-75696c8c9-pqtkg","error":"dial tcp 170.33.9.230:8080: i/o timeout","stacktrace":"knative.dev/serving/vendor/knative.dev/pkg/websocket.NewDurableConnection.func1\n\tknative.dev/serving/vendor/knative.dev/pkg/websocket/connection.go:116\nknative.dev/serving/vendor/knative.dev/pkg/websocket.(ManagedConnection).connect.func1\n\tknative.dev/serving/vendor/knative.dev/pkg/websocket/connection.go:195\nknative.dev/serving/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\tknative.dev/serving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:292\nknative.dev/serving/vendor/knative.dev/pkg/websocket.(ManagedConnection).connect\n\tknative.dev/serving/vendor/knative.dev/pkg/websocket/connection.go:192\nknative.dev/serving/vendor/knative.dev/pkg/websocket.NewDurableConnection.func2\n\tknative.dev/serving/vendor/knative.dev/pkg/websocket/connection.go:133"} {"level":"error","ts":"2021-11-16T08:06:38.097Z","logger":"activator","caller":"websocket/connection.go:162","msg":"Failed to send ping message to ws://autoscaler.knative-serving.svc.cluster.local:8080","commit":"bcda051","knative.dev/controller":"activator","knative.dev/pod":"activator-75696c8c9-pqtkg","error":"connection has not yet been established","stacktrace":"knative.dev/serving/vendor/knative.dev/pkg/websocket.NewDurableConnection.func3\n\tknative.dev/serving/vendor/knative.dev/pkg/websocket/connection.go:162"} {"level":"warn","ts":"2021-11-16T08:06:38.278Z","logger":"activator","caller":"handler/healthz_handler.go:33","msg":"Healthcheck failed: connection has not yet been established","commit":"bcda051","knative.dev/controller":"activator","knative.dev/pod":"activator-75696c8c9-pqtkg"}

2,。 OIDC provider setup failed time="2021-11-16T08:54:00Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get http://dex.auth.svc.cluster.local:5556/dex/.well-known/openid-configuration: dial tcp 170.33.9.230:5556: i/o timeout"

chenbodeng719 commented 2 years ago

@TaibiaoGuo hi,我也碰到了这个auth问题,请问,您是怎么解决的?


solve it by configure Persistent Volumes provisioner for k8s

Alvin-4550 commented 2 years ago

I also encountered this problem. Has it been solved?

tianya092 commented 1 year ago

已收到您的来信,非常感谢!

Don12138 commented 1 year ago

请问问题解决了吗

tianya092 commented 1 year ago

已收到您的来信,非常感谢!