Alert: Component controller-manager is unhealthy. Alert: Component scheduler is unhealthy.

IsQiao commented 3 years ago

~/repo/rancher-in-kind# ./rkind.sh create
INFO: Launching Rancher container
851a348282e9c7ebfff72c4991ca69bb300b3b6ffbe873056ec009d15e040ea3
INFO: Rancher UI will be available at https://63.250.52.150:31210
INFO: It might take few up to 60 seconds for Rancher UI to become available..
INFO: While it's coming up, going to start KIND cluster
No kind clusters found.
INFO: Creating Kind cluster ...
Creating cluster "kind-for-rancher" ...
 ✓ Ensuring node image (kindest/node:v1.19.1) 🖼
 ✓ Preparing nodes 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind-for-rancher"
You can now use your cluster with:

kubectl cluster-info --context kind-kind-for-rancher

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
### Next steps ###
- Setup admin credentials in Rancher UI
- Set "Rancher Server URL" to "https://63.250.52.150:31210" (should already be selected)
  you may change it at any time in "Settings"
- wait for 2 minute
- Import KIND cluster to Rancher (via https://63.250.52.150:31210/g/clusters/add?provider=import)
  (select "Import Existing cluster" when adding a cluster)
  > To work around "Unable to connect to the server: x509: certificate signed by unknown authority"
  > use "curl --insecure" which is provided by Rancher UI to get the manifest, piping it's output to, for example:

    curl --insecure -sfL https://63.250.52.150:31210/v3/import/6qbm7q9lk7gmqsgt4l2hrrchlxbfh6fjskzb8tx84mjrl9jvhb8xcm.yaml | kubectl apply -f -

- set context to kind cluster

kubectl cluster-info --context kind-for-rancher

### Destroy
To shut everything down, use "./rkind.sh destroy", or manually with
docker rm -f rancher-for-kind; kind delete cluster kind-for-rancher
curl: (52) Empty reply from server
pongtoken-2sjmr:bx5snsr4ztmh86j7mlngh4tzj2dttsdhjgtvhpsgz2fbs92b92jxnm
token-whggm:2vhpj4dq7s8dlnnrhvp77fl555dww6s2999w9m79l8pf842r5wpqk8
token-jczsp:bhkldtc9g2tjq99rk7jhl4x25htp2ql5n4q5twx5w6l7zszslmjrnr
{"baseType":"setting","created":"2020-12-19T09:57:21Z","createdTS":1608371841000,"creatorId":null,"customized":true,"default":"","id":"server-url","links":{"remove":"https://63.250.52.150:31210/v3/settings/server-url","self":"https://63.250.52.150:31210/v3/settings/server-url","update":"https://63.250.52.150:31210/v3/settings/server-url"},"name":"server-url","source":"db","type":"setting","uuid":"86283f24-7354-4628-950a-babe0560561d","value":"https://63.250.52.150:31210"}
c-k6lvw
curl --insecure -sfL https://63.250.52.150:31210/v3/import/v29kkkdv48jslhdfmrd4rfv5lwzjckb2z4b2jcqzvdnnm7ccw99cqs.yaml | kubectl apply -f -
Kubernetes control plane is running at https://127.0.0.1:43745
KubeDNS is running at https://127.0.0.1:43745/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver created
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master created
namespace/cattle-system created
serviceaccount/cattle created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding created
secret/cattle-credentials-0032d3b created
clusterrole.rbac.authorization.k8s.io/cattle-admin created
deployment.apps/cattle-cluster-agent created
Rancher admin password is: password
Rancher URL is https://63.250.52.150:31210
Rancher account: admin / password
./rkind.sh: line 219: open: command not found
root@ssdnodes-5f55d2e9cb4e2:~/repo/rancher-in-kind# kubectl cluster-info --context kind-for-rancher
error: context "kind-for-rancher" does not exist
root@ssdnodes-5f55d2e9cb4e2:~/repo/rancher-in-kind# kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:43745
KubeDNS is running at https://127.0.0.1:43745/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

pod log:

root@ssdnodes-5f55d2e9cb4e2:~# I1219 09:57:35.899322 I1219 09:57:35.899939 I1219 09:57:36.527832 W1219 09:57:48.835489 W1219 09:57:48.835589 W1219 09:57:48.835604 I1219 09:57:54.854460 I1219 09:57:54.854499 I1219 09:57:54.860001 I1219 09:57:54.862847 I1219 09:57:54.862875 I1219 09:57:54.862952 E1219 09:57:54.877520 E1219 09:57:54.892026 E1219 09:57:54.892193 E1219 09:57:54.892348 E1219 09:57:54.906175 E1219 09:57:54.909153 E1219 09:57:54.909479 E1219 09:57:54.909882 E1219 09:57:54.911699 E1219 09:57:54.911918 E1219 09:57:54.912102 E1219 09:57:54.919548 E1219 09:57:54.963123 E1219 09:57:55.748966 E1219 09:57:55.770755 E1219 09:57:55.827477 E1219 09:57:55.902388 E1219 09:57:55.939924 E1219 09:57:56.009457 E1219 09:57:56.101354 E1219 09:57:56.162666 E1219 09:57:56.214060 E1219 09:57:56.354027 E1219 09:57:56.433107 E1219 09:57:56.515946 E1219 09:57:56.522103 E1219 09:57:57.424758 E1219 09:57:57.725160 E1219 09:57:57.974958 E1219 09:57:57.988551 E1219 09:57:58.029214 E1219 09:57:58.088739 E1219 09:57:58.092579 E1219 09:57:58.303153 I1219 09:58:02.165083 I1219 09:58:03.574710 I1219 09:58:03.666214 kubectl logs -n kube-system kube-scheduler-kind-for-rancher-control-plane 1 registry.go:173] Registering SelectorSpread plugin 1 registry.go:173] Registering SelectorSpread plugin 1 serving.go:331] Generated self-signed cert in-memory 1 authentication.go:294] Error looking up in-cluster authentication configuration: Get "https://172.18.0.2:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": net/http: TLS handshake timeout 1 authentication.go:295] Continuing without authentication configuration. This may treat all requests as anonymous. 1 authentication.go:296] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false 1 registry.go:173] Registering SelectorSpread plugin 1 registry.go:173] Registering SelectorSpread plugin 1 secure_serving.go:197] Serving securely on 127.0.0.1:10259 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 tlsconfig.go:240] Starting DynamicServingCertificateController 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.CSINode: failed to list v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StatefulSet: failed to list v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Node: failed to list v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicaSet: failed to list v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1beta1.PodDisruptionBudget: failed to list v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolume: failed to list v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicationController: failed to list v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StorageClass: failed to list v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Service: failed to list v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolumeClaim: failed to list v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch v1.ConfigMap: failed to list v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system" 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Node: failed to list v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Service: failed to list v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1beta1.PodDisruptionBudget: failed to list v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.CSINode: failed to list v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StatefulSet: failed to list v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolumeClaim: failed to list v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch v1.ConfigMap: failed to list v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system" 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicaSet: failed to list v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolume: failed to list v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicationController: failed to list v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StorageClass: failed to list v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Node: failed to list v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Service: failed to list v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1beta1.PodDisruptionBudget: failed to list v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.CSINode: failed to list v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope 1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch v1.ConfigMap: failed to list v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system" 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolume: failed to list v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StorageClass: failed to list v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler... 1 leaderelection.go:253] successfully acquired lease kube-system/kube-scheduler

ozbillwang commented 3 years ago

I found there are some changes in Rancher version 2.5+

Let me dig it out

ozbillwang commented 3 years ago

So the problem in both KIND and Rancher v2.5+,

In KIND's pod, it can't access its host server with port directly, for example, rancher is running on 192.168.1.102:35555, this is the KIND setting issue. It can before.

I can change the host IP to host.docker.internal, or kubernetes.docker.internal, then in pod, it can access via https://host.docker.internal:35555

Now we need adjust in Rancher.

curl --insecure -sfL https://192.168.1.102:35555/v3/import/hgvxmhhh7jggxjwztjgfltk9hzkg2xkb5n2jhl2pl7zd8k5tb6h7fr.yaml > a.yaml

then update from

          - name: CATTLE_SERVER
            value: "https://192.168.1.102:35555"

to

          - name: CATTLE_SERVER
            value: "https://host.docker.internal:35555"

It should work, but after deploy a.yaml, there are something new issue in Rancher v2.5+ with wss port, it doesn't get updated and still try to use the host IP.

$ kk logs -f pod/cattle-cluster-agent-6464cc7756-5lgkl
INFO: Environment: CATTLE_ADDRESS=10.244.1.2 CATTLE_CA_CHECKSUM=2880b8407eea57b860fd53efb9bbd2a3c86581e53caf14d2bb01d68aa362c356 CATTLE_CLUSTER=true CATTLE_FEATURES= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-6464cc7756-5lgkl CATTLE_SERVER=https://host.docker.internal:35555
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 10.96.0.10 options ndots:5
INFO: https://host.docker.internal:35555/ping is accessible
INFO: host.docker.internal resolves to 192.168.65.2
INFO: Value from https://host.docker.internal:35555/v3/settings/cacerts is an x509 certificate
time="2020-12-19T11:11:04Z" level=info msg="Listening on /tmp/log.sock"
time="2020-12-19T11:11:04Z" level=info msg="Rancher agent version v2.5.3 is starting"
time="2020-12-19T11:11:09Z" level=info msg="Connecting to wss://192.168.1.102:35555/v3/connect/register with token hgvxmhhh7jggxjwztjgfltk9hzkg2xkb5n2jhl2pl7zd8k5tb6h7fr"
time="2020-12-19T11:11:09Z" level=info msg="Connecting to proxy" url="wss://192.168.1.102:35555/v3/connect/register"
time="2020-12-19T11:11:19Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 192.168.1.102:35555: i/o timeout"
time="2020-12-19T11:11:19Z" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.1.102:35555: i/o timeout"
time="2020-12-19T11:11:29Z" level=info msg="Connecting to wss://192.168.1.102:35555/v3/connect/register with token hgvxmhhh7jggxjwztjgfltk9hzkg2xkb5n2jhl2pl7zd8k5tb6h7fr"
time="2020-12-19T11:11:29Z" level=info msg="Connecting to proxy" url="wss://192.168.1.102:35555/v3/connect/register"

ozbillwang commented 3 years ago

Raise an issue to rancher for help

https://github.com/rancher/rancher/issues/30562

Herohtar commented 3 years ago

I was able to resolve this issue by adding both Rancher and Kind to the same network, then setting Rancher's server-url property to its IP address on that network. Here's an overview of the steps, assuming you already have Rancher running in Docker with a name of rancher and Kind running as kind-control-plane:

docker create network kind-rancher
docker network connect kind-rancher rancher
docker network connect kind-rancher kind-control-plane
docker inspect rancher
In the resulting output, look at NetworkSettings.Networks.kind-rancher.IPAddress -- in my case it was 172.18.0.3
Open Rancher Global Settings -> Advanced Settings and change server-url to the IP address (eg, https://172.18.0.3)
Follow the usual steps to import an existing cluster

Doing it this way, you don't need to customize Kind with additional external ports -- it works with the default configuration.

ozbillwang commented 3 years ago

thanks for the updates. I will check this solution when have time.

ozbillwang commented 2 years ago

Latest rancher version is v2.6.2 now, I run a test with the exist script directly.

security password is not updated, seems the new version has different API to set username and password
need accept the agreement, an extra step.
after login, KIND cluster is not automatically added.
after manually import the cluster, wait for about 2 minutes, it is active. So looks the new version should work with KIND script, but need some adjustment.

ozbillwang commented 2 years ago

@IsQiao @Herohtar

The codes have been updated. I run the test locally it is fine.

Could you please run a test for me?

git clone git@github.com:ozbillwang/rancher-in-kind.git
cd rancher-in-kind
./rkind.sh create

wait for about 2 minutes, you can login with admin / password

That's all. No extra docker commands.

If you see any errors, please paste the logs to me.

ozbillwang commented 2 years ago

Ok, I closed it now. Raise new issue if it doesn't work

aroscani commented 2 years ago

Hi @ozbillwang, I'm trying to deploy rancher in kind on my WSL (windows 10) with some issues. The script works well and executes all commands fine but at the end the cluster remaing pending, I can see the state in the Rancher homepage. I have deployed the master branch of your repo. Tell me if I could give you more details.

Thank you,

Albert

ozbillwang commented 2 years ago

WSL is ubuntu, right? I tested my code on MacOS

Let me test it in ubuntu, when have time.

if can, share all output with this command

bash -x ./rkind.sh create

aroscani commented 2 years ago

Thank you @ozbillwang, this is the log file log-rancher-in-kind.txt

This is the cluster state remaing pending

ozbillwang commented 2 years ago

@aroscani

could you confirm if the problem is similar as #7?

What's the output of kubectl -n cattle-system logs -f pod/cattle-cluster-agent-xxxx-xxxx |grep PORT_XXXX

ozbillwang / rancher-in-kind

Alert: Component controller-manager is unhealthy. Alert: Component scheduler is unhealthy. #3