rancher / rio

Application Deployment Engine for Kubernetes
https://rio.io
Apache License 2.0
2.27k stars 228 forks source link

rio dashboard: "unable to retrieve the complete list of server APIs: tap.linkerd.io/v1alpha1: the server is currently unable to handle the request" #1071

Open gnoejuan opened 3 years ago

gnoejuan commented 3 years ago
time="2021-03-04T19:33:43Z" level=info msg="Starting management controllers"
time="2021-03-04T19:33:43Z" level=info msg="Rancher startup complete"
time="2021-03-04T19:33:44Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2021-03-04T19:33:44Z" level=info msg="Starting apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition controller"
time="2021-03-04T19:33:44Z" level=info msg="Refreshing all schemas"
time="2021-03-04T19:33:44Z" level=info msg="Refreshing all schemas"
time="2021-03-04T19:33:44Z" level=fatal msg="unable to retrieve the complete list of server APIs: tap.linkerd.io/v1alpha1: the server is currently unable to handle the request

linkerd-tap container

...
2021/03/04 19:36:38 http: TLS handshake error from 127.0.0.1:50208: remote error: tls: bad certificate
2021/03/04 19:36:41 http: TLS handshake error from 127.0.0.1:50244: remote error: tls: bad certificate
2021/03/04 19:36:44 http: TLS handshake error from 127.0.0.1:50280: remote error: tls: bad certificate
2021/03/04 19:37:01 http: TLS handshake error from 127.0.0.1:50480: remote error: tls: bad certificate
2021/03/04 19:37:11 http: TLS handshake error from 127.0.0.1:50598: remote error: tls: bad certificate
2021/03/04 19:37:14 http: TLS handshake error from 127.0.0.1:50638: remote error: tls: bad certificate
2021/03/04 19:37:15 http: TLS handshake error from 127.0.0.1:50652: remote error: tls: bad certificate
...

Deployed to Ubuntu 20.04 running k3s version v1.19.8+k3s1 (95fc76b2). The demo app works: https://demo-v0-default.nnlihq.on-rio.io/

I can provide more info if needed.

EDIT: rio kill dashboard doesn't work. I assume it's because the dashboard wasn't successfully deployed. time="2021-03-04T13:43:17-06:00" level=fatal msg="failed to find service dashboard"

EDIT: I've taken down the demo app

Hantse commented 3 years ago

Hello,

I have the same issue, no feedback ?

Kind regards,

juanchristensen commented 3 years ago

It seems to be an issue with Linkerd (or specifically with the default configuration value Rio used to install Linkerd, which defaults the certificates expiration date to 365 days). This means that every Rio user thats been using Rio for more than 1 year is going to face the same issue.

You can verify this by running the following command with the Linkerd CLI: linkerd check --proxy

You should get something similar to this output:


--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
× trust anchors are within their validity period
    Invalid anchors:
    * 1 identity.linkerd.cluster.local not valid anymore. Expired on 2021-05-08T04:40:41Z
    see https://linkerd.io/checks/#l5d-identity-trustAnchors-are-time-valid for hints

Status check results are ×```
juanchristensen commented 3 years ago

More on this can be found here: https://linkerd.io/2.9/tasks/manually-rotating-control-plane-tls-credentials/#generate-a-new-trust-anchor

I went through those steps without much success thus far.

It would seem that there are some issues on the Linkerd version used by Rio as well: https://github.com/linkerd/linkerd2/issues/4808

juanchristensen commented 3 years ago

FYI, I solved this by downloading the Linkerd CLI with the same version used by Rio on my install, uninstalling Linkerd and then reinstalling. Finally, doing a rolling restart of all my deployments that are meshed in, as well as those in rio-system.

export LINKERD2_VERSION=stable-2.6.1
curl -sL https://run.linkerd.io/install | sh
linkerd uninstall --force | kubectl delete -f -
linkerd install | kubectl apply -f -
kubectl -n rio-system rollout restart deploy

kubectl -n default rollout restart deploy