ozbillwang / rancher-in-kind

Scripts to start up Rancher with Kind (kubernetes in docker) cluster
29 stars 17 forks source link

Alert: Component controller-manager is unhealthy. Alert: Component scheduler is unhealthy. #3

Closed IsQiao closed 2 years ago

IsQiao commented 3 years ago
~/repo/rancher-in-kind# ./rkind.sh create
INFO: Launching Rancher container
851a348282e9c7ebfff72c4991ca69bb300b3b6ffbe873056ec009d15e040ea3
INFO: Rancher UI will be available at https://63.250.52.150:31210
INFO: It might take few up to 60 seconds for Rancher UI to become available..
INFO: While it's coming up, going to start KIND cluster
No kind clusters found.
INFO: Creating Kind cluster ...
Creating cluster "kind-for-rancher" ...
 βœ“ Ensuring node image (kindest/node:v1.19.1) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦ πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
 βœ“ Joining worker nodes 🚜
Set kubectl context to "kind-kind-for-rancher"
You can now use your cluster with:

kubectl cluster-info --context kind-kind-for-rancher

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community πŸ™‚
### Next steps ###
- Setup admin credentials in Rancher UI
- Set "Rancher Server URL" to "https://63.250.52.150:31210" (should already be selected)
  you may change it at any time in "Settings"
- wait for 2 minute
- Import KIND cluster to Rancher (via https://63.250.52.150:31210/g/clusters/add?provider=import)
  (select "Import Existing cluster" when adding a cluster)
  > To work around "Unable to connect to the server: x509: certificate signed by unknown authority"
  > use "curl --insecure" which is provided by Rancher UI to get the manifest, piping it's output to, for example:

    curl --insecure -sfL https://63.250.52.150:31210/v3/import/6qbm7q9lk7gmqsgt4l2hrrchlxbfh6fjskzb8tx84mjrl9jvhb8xcm.yaml | kubectl apply -f -

- set context to kind cluster

kubectl cluster-info --context kind-for-rancher

### Destroy
To shut everything down, use "./rkind.sh destroy", or manually with
docker rm -f rancher-for-kind; kind delete cluster kind-for-rancher
curl: (52) Empty reply from server
pongtoken-2sjmr:bx5snsr4ztmh86j7mlngh4tzj2dttsdhjgtvhpsgz2fbs92b92jxnm
token-whggm:2vhpj4dq7s8dlnnrhvp77fl555dww6s2999w9m79l8pf842r5wpqk8
token-jczsp:bhkldtc9g2tjq99rk7jhl4x25htp2ql5n4q5twx5w6l7zszslmjrnr
{"baseType":"setting","created":"2020-12-19T09:57:21Z","createdTS":1608371841000,"creatorId":null,"customized":true,"default":"","id":"server-url","links":{"remove":"https://63.250.52.150:31210/v3/settings/server-url","self":"https://63.250.52.150:31210/v3/settings/server-url","update":"https://63.250.52.150:31210/v3/settings/server-url"},"name":"server-url","source":"db","type":"setting","uuid":"86283f24-7354-4628-950a-babe0560561d","value":"https://63.250.52.150:31210"}
c-k6lvw
curl --insecure -sfL https://63.250.52.150:31210/v3/import/v29kkkdv48jslhdfmrd4rfv5lwzjckb2z4b2jcqzvdnnm7ccw99cqs.yaml | kubectl apply -f -
Kubernetes control plane is running at https://127.0.0.1:43745
KubeDNS is running at https://127.0.0.1:43745/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver created
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master created
namespace/cattle-system created
serviceaccount/cattle created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding created
secret/cattle-credentials-0032d3b created
clusterrole.rbac.authorization.k8s.io/cattle-admin created
deployment.apps/cattle-cluster-agent created
Rancher admin password is: password
Rancher URL is https://63.250.52.150:31210
Rancher account: admin / password
./rkind.sh: line 219: open: command not found
root@ssdnodes-5f55d2e9cb4e2:~/repo/rancher-in-kind# kubectl cluster-info --context kind-for-rancher
error: context "kind-for-rancher" does not exist
root@ssdnodes-5f55d2e9cb4e2:~/repo/rancher-in-kind# kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:43745
KubeDNS is running at https://127.0.0.1:43745/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

image

pod log:

root@ssdnodes-5f55d2e9cb4e2:~# kubectl logs -n kube-system kube-scheduler-kind-for-rancher-control-plane I1219 09:57:35.899322 1 registry.go:173] Registering SelectorSpread plugin I1219 09:57:35.899939 1 registry.go:173] Registering SelectorSpread plugin I1219 09:57:36.527832 1 serving.go:331] Generated self-signed cert in-memory W1219 09:57:48.835489 1 authentication.go:294] Error looking up in-cluster authentication configuration: Get "https://172.18.0.2:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": net/http: TLS handshake timeout W1219 09:57:48.835589 1 authentication.go:295] Continuing without authentication configuration. This may treat all requests as anonymous. W1219 09:57:48.835604 1 authentication.go:296] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false I1219 09:57:54.854460 1 registry.go:173] Registering SelectorSpread plugin I1219 09:57:54.854499 1 registry.go:173] Registering SelectorSpread plugin I1219 09:57:54.860001 1 secure_serving.go:197] Serving securely on 127.0.0.1:10259 I1219 09:57:54.862847 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1219 09:57:54.862875 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1219 09:57:54.862952 1 tlsconfig.go:240] Starting DynamicServingCertificateController E1219 09:57:54.877520 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.CSINode: failed to list v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope E1219 09:57:54.892026 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StatefulSet: failed to list v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope E1219 09:57:54.892193 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Node: failed to list v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope E1219 09:57:54.892348 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicaSet: failed to list v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope E1219 09:57:54.906175 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1beta1.PodDisruptionBudget: failed to list v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope E1219 09:57:54.909153 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolume: failed to list v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope E1219 09:57:54.909479 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope E1219 09:57:54.909882 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicationController: failed to list v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope E1219 09:57:54.911699 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StorageClass: failed to list v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope E1219 09:57:54.911918 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Service: failed to list v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope E1219 09:57:54.912102 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolumeClaim: failed to list v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope E1219 09:57:54.919548 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope E1219 09:57:54.963123 1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch v1.ConfigMap: failed to list v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system" E1219 09:57:55.748966 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Node: failed to list v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope E1219 09:57:55.770755 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Service: failed to list v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope E1219 09:57:55.827477 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope E1219 09:57:55.902388 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1beta1.PodDisruptionBudget: failed to list v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope E1219 09:57:55.939924 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.CSINode: failed to list v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope E1219 09:57:56.009457 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StatefulSet: failed to list v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope E1219 09:57:56.101354 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolumeClaim: failed to list v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope E1219 09:57:56.162666 1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch v1.ConfigMap: failed to list v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system" E1219 09:57:56.214060 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicaSet: failed to list v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope E1219 09:57:56.354027 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope E1219 09:57:56.433107 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolume: failed to list v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope E1219 09:57:56.515946 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.ReplicationController: failed to list v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope E1219 09:57:56.522103 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StorageClass: failed to list v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope E1219 09:57:57.424758 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Node: failed to list v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope E1219 09:57:57.725160 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.Service: failed to list v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope E1219 09:57:57.974958 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch v1.Pod: failed to list v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope E1219 09:57:57.988551 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1beta1.PodDisruptionBudget: failed to list v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope E1219 09:57:58.029214 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.CSINode: failed to list v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope E1219 09:57:58.088739 1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch v1.ConfigMap: failed to list v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system" E1219 09:57:58.092579 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.PersistentVolume: failed to list v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope E1219 09:57:58.303153 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch v1.StorageClass: failed to list v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope I1219 09:58:02.165083 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1219 09:58:03.574710 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler... I1219 09:58:03.666214 1 leaderelection.go:253] successfully acquired lease kube-system/kube-scheduler

ozbillwang commented 3 years ago

I found there are some changes in Rancher version 2.5+

Let me dig it out

ozbillwang commented 3 years ago

So the problem in both KIND and Rancher v2.5+,

In KIND's pod, it can't access its host server with port directly, for example, rancher is running on 192.168.1.102:35555, this is the KIND setting issue. It can before.

I can change the host IP to host.docker.internal, or kubernetes.docker.internal, then in pod, it can access via https://host.docker.internal:35555

Now we need adjust in Rancher.

curl --insecure -sfL https://192.168.1.102:35555/v3/import/hgvxmhhh7jggxjwztjgfltk9hzkg2xkb5n2jhl2pl7zd8k5tb6h7fr.yaml > a.yaml

then update from

          - name: CATTLE_SERVER
            value: "https://192.168.1.102:35555"

to

          - name: CATTLE_SERVER
            value: "https://host.docker.internal:35555"

It should work, but after deploy a.yaml, there are something new issue in Rancher v2.5+ with wss port, it doesn't get updated and still try to use the host IP.

$ kk logs -f pod/cattle-cluster-agent-6464cc7756-5lgkl
INFO: Environment: CATTLE_ADDRESS=10.244.1.2 CATTLE_CA_CHECKSUM=2880b8407eea57b860fd53efb9bbd2a3c86581e53caf14d2bb01d68aa362c356 CATTLE_CLUSTER=true CATTLE_FEATURES= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-6464cc7756-5lgkl CATTLE_SERVER=https://host.docker.internal:35555
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 10.96.0.10 options ndots:5
INFO: https://host.docker.internal:35555/ping is accessible
INFO: host.docker.internal resolves to 192.168.65.2
INFO: Value from https://host.docker.internal:35555/v3/settings/cacerts is an x509 certificate
time="2020-12-19T11:11:04Z" level=info msg="Listening on /tmp/log.sock"
time="2020-12-19T11:11:04Z" level=info msg="Rancher agent version v2.5.3 is starting"
time="2020-12-19T11:11:09Z" level=info msg="Connecting to wss://192.168.1.102:35555/v3/connect/register with token hgvxmhhh7jggxjwztjgfltk9hzkg2xkb5n2jhl2pl7zd8k5tb6h7fr"
time="2020-12-19T11:11:09Z" level=info msg="Connecting to proxy" url="wss://192.168.1.102:35555/v3/connect/register"
time="2020-12-19T11:11:19Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 192.168.1.102:35555: i/o timeout"
time="2020-12-19T11:11:19Z" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.1.102:35555: i/o timeout"
time="2020-12-19T11:11:29Z" level=info msg="Connecting to wss://192.168.1.102:35555/v3/connect/register with token hgvxmhhh7jggxjwztjgfltk9hzkg2xkb5n2jhl2pl7zd8k5tb6h7fr"
time="2020-12-19T11:11:29Z" level=info msg="Connecting to proxy" url="wss://192.168.1.102:35555/v3/connect/register"
ozbillwang commented 3 years ago

Raise an issue to rancher for help

https://github.com/rancher/rancher/issues/30562

Herohtar commented 3 years ago

I was able to resolve this issue by adding both Rancher and Kind to the same network, then setting Rancher's server-url property to its IP address on that network. Here's an overview of the steps, assuming you already have Rancher running in Docker with a name of rancher and Kind running as kind-control-plane:

  1. docker create network kind-rancher
  2. docker network connect kind-rancher rancher
  3. docker network connect kind-rancher kind-control-plane
  4. docker inspect rancher
  5. In the resulting output, look at NetworkSettings.Networks.kind-rancher.IPAddress -- in my case it was 172.18.0.3
  6. Open Rancher Global Settings -> Advanced Settings and change server-url to the IP address (eg, https://172.18.0.3)
  7. Follow the usual steps to import an existing cluster

Doing it this way, you don't need to customize Kind with additional external ports -- it works with the default configuration.

ozbillwang commented 3 years ago

thanks for the updates. I will check this solution when have time.

ozbillwang commented 2 years ago

Latest rancher version is v2.6.2 now, I run a test with the exist script directly.

  1. security password is not updated, seems the new version has different API to set username and password
  2. need accept the agreement, an extra step.
  3. after login, KIND cluster is not automatically added.
  4. after manually import the cluster, wait for about 2 minutes, it is active. So looks the new version should work with KIND script, but need some adjustment.

image

ozbillwang commented 2 years ago

@IsQiao @Herohtar

The codes have been updated. I run the test locally it is fine.

Could you please run a test for me?

git clone git@github.com:ozbillwang/rancher-in-kind.git
cd rancher-in-kind
./rkind.sh create

wait for about 2 minutes, you can login with admin / password

That's all. No extra docker commands.

If you see any errors, please paste the logs to me.

image

ozbillwang commented 2 years ago

Ok, I closed it now. Raise new issue if it doesn't work

aroscani commented 2 years ago

Hi @ozbillwang, I'm trying to deploy rancher in kind on my WSL (windows 10) with some issues. The script works well and executes all commands fine but at the end the cluster remaing pending, I can see the state in the Rancher homepage. I have deployed the master branch of your repo. Tell me if I could give you more details.

Thank you,

Albert

ozbillwang commented 2 years ago

WSL is ubuntu, right? I tested my code on MacOS

Let me test it in ubuntu, when have time.

if can, share all output with this command

bash -x ./rkind.sh create
aroscani commented 2 years ago

Thank you @ozbillwang, this is the log file log-rancher-in-kind.txt

This is the cluster state remaing pending

image

ozbillwang commented 2 years ago

@aroscani

could you confirm if the problem is similar as #7?

What's the output of kubectl -n cattle-system logs -f pod/cattle-cluster-agent-xxxx-xxxx |grep PORT_XXXX