krkn-chaos / cerberus

Guardian of Kubernetes clusters. Tool to monitor clusters health and signal/alert on failures.
Apache License 2.0
92 stars 41 forks source link

Connection refused #170

Open thesimpledata opened 2 years ago

thesimpledata commented 2 years ago

Getting the below exception. Using IBM cloud Environment Name: IBM RedHat Openshift Kubernetes Service (VPC Gen2 with ODF) . Appears to be it does not have etcd cluster.

shakilkhan@Shakils-MacBook-Pro cerberus % source monitor/bin/activate (monitor) shakilkhan@Shakils-MacBook-Pro cerberus % python3 start_cerberus.py --config /Users/shakilkhan/TECHZONE/TEC/Cerberus/config/config.yaml >> log.log 2022-06-12 11:33:59,197 [INFO] Starting ceberus 2022-06-12 11:33:59,206 [INFO] Initializing client to talk to the Kubernetes cluster 2022-06-12 11:33:59,762 [INFO] Fetching cluster info 2022-06-12 11:33:59,877 [INFO] Cluster version is 4.10.15 2022-06-12 11:33:59,877 [INFO] Server URL: https://c111-e.us-east.containers.cloud.ibm.com:32653 2022-06-12 11:33:59,877 [INFO] Publishing cerberus status at http://0.0.0.0:8080 2022-06-12 11:34:04,880 [INFO] Starting http server at http://0.0.0.0:8080

2022-06-12 11:34:09,399 [INFO] Daemon mode enabled, cerberus will monitor forever 2022-06-12 11:34:09,400 [INFO] Ignoring the iterations set

WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11a3f1fa0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-ingress/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116298dc0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-monitoring/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x122b6d790>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-machine-api/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116369a60>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-apiserver/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11a4040d0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-ingress/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116369b50>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-apiserver/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x122b6d880>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-machine-api/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116298eb0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-monitoring/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11a404250>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-ingress/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116369cd0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-apiserver/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x122b6da00>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-machine-api/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1162b1070>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-monitoring/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1163894c0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-etcd/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11a404490>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-scheduler/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1162b12b0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-apiserver/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x122b8d250>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-controller-manager/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11a404b20>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-scheduler/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116389580>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-etcd/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1162b1940>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-apiserver/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x122b8d340>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-controller-manager/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11a404ca0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-scheduler/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x122b8d4c0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-controller-manager/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1162b1ac0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-apiserver/pods?pretty=True&limit= WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x116389700>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-etcd/pods?pretty=True&limit= 2022-06-12 11:34:09,462 [INFO] Encountered issues in cluster. Hence, setting the go/no-go signal to false 2022-06-12 11:34:09,479 [INFO] Exception: None: Max retries exceeded with url: /api/v1/namespaces/openshift-apiserver/pods?pretty=True&limit= (Caused by None)

2022-06-12 11:34:09,480 [INFO] SHAKIL 2022-06-12 11:34:09,480 [ERROR] None: Max retries exceeded with url: /api/v1/namespaces/openshift-apiserver/pods?pretty=True&limit= (Caused by None) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection raise err File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/connection.py", line 239, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1253, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1299, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1248, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1008, in _send_output self.send(msg) File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 948, in send self.connect() File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect conn = self._new_conn() File "/Users/shakilkhan/TECHZONE/TEC/cerberus/monitor/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x116369e20>: Failed to establish a new connection: [Errno 61] Connection refused

chaitanyaenr commented 2 years ago

Hi @thesimpledata, thanks for reporting the issue. Looking at the logs, there seems to be issues with connecting with the cluster APIs: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/namespaces/openshift-kube-apiserver/pods?pretty=True&limit=.

Also we might want to make sure the openshift user being used has appropriate permissions.

Thoughts?

thesimpledata commented 2 years ago

The user I am using is the cluster-admin . do i need any explicit rolebinding to run the cluster API?

Name -- / Name Role ref Subject kind Subject name Namespace ClusterRoleBindingCRB [cluster-admin-0](https://console-openshift-console.itzroks-666000972r-81yu06-4b4a324f027aea19c5cbc0c3275c4656-0000.us-south.containers.appdomain.cloud/k8s/cluster/clusterrolebindings/cluster-admin-0) ClusterRoleCR [cluster-admin](https://console-openshift-console.itzroks-666000972r-81yu06-4b4a324f027aea19c5cbc0c3275c4656-0000.us-south.containers.appdomain.cloud/k8s/cluster/clusterroles/cluster-admin) User IAM#shakil.khan@ibm.com All namespaces ClusterRoleBindingCRB [ibm-admin](https://console-openshift-console.itzroks-666000972r-81yu06-4b4a324f027aea19c5cbc0c3275c4656-0000.us-south.containers.appdomain.cloud/k8s/cluster/clusterrolebindings/ibm-admin) ClusterRoleCR [cluster-admin](https://console-openshift-console.itzroks-666000972r-81yu06-4b4a324f027aea19c5cbc0c3275c4656-0000.us-south.containers.appdomain.cloud/k8s/cluster/clusterroles/cluster-admin) User IAM#shakil.khan@ibm.com All namespaces RoleBindingRB [user-settings-195869fc-dcda-4993-8a3d-4748513a421d-rolebinding](https://console-openshift-console.itzroks-666000972r-81yu06-4b4a324f027aea19c5cbc0c3275c4656-0000.us-south.containers.appdomain.cloud/k8s/ns/openshift-console-user-settings/rolebindings/user-settings-195869fc-dcda-4993-8a3d-4748513a421d-rolebinding) RoleR [user-settings-195869fc-dcda-4993-8a3d-4748513a421d-role](https://console-openshift-console.itzroks-666000972r-81yu06-4b4a324f027aea19c5cbc0c3275c4656-0000.us-south.containers.appdomain.cloud/k8s/ns/openshift-console-user-settings/roles/user-settings-195869fc-dcda-4993-8a3d-4748513a421d-role) User IAM#shakil.khan@ibm.com
thesimpledata commented 2 years ago

My kubeconfig looks like this

apiVersion: v1 clusters:

  • cluster: server: https://c114-e.us-south.containers.cloud.ibm.com:31215 name: c114-e-us-south-containers-cloud-ibm-com:31215 contexts:
  • context: cluster: c114-e-us-south-containers-cloud-ibm-com:31215 namespace: default user: IAM#shakil.khan@ibm.com/c114-e-us-south-containers-cloud-ibm-com:31215 name: default/c114-e-us-south-containers-cloud-ibm-com:31215/IAM#shakil.khan@ibm.com current-context: default/c114-e-us-south-containers-cloud-ibm-com:31215/IAM#shakil.khan@ibm.com kind: Config preferences: {} users:
  • name: IAM#shakil.khan@ibm.com/c114-e-us-south-containers-cloud-ibm-com:31215 user: token: sha256~9TcSTiu9mc3TpMaKNB9IJBZFqK-EELgLmKL4e8_rytA

Now in order for cluster API to work I need to construct the url as below

curl -H "Authorization: Bearer sha256~9TcSTiu9mc3TpMaKNB9IJBZFqK-EELgLmKL4e8_rytA" "https://c114-e.us-south.containers.cloud.ibm.com:31215/api/v1/nodes?limit=1"

I did not dig that far but I believe cerberus is not composing that url properly

thesimpledata commented 2 years ago

i am using IBM CLOUD

ghost commented 2 years ago

@thesimpledata this is not a permission error, this is a connection issue. Wherever you are running krkn, you don't seem to have network access to the OpenShift API.

thesimpledata commented 2 years ago

Not true. On the same cluster Kraken is working fine. Again I can manually run the API.

(monitor) shakilkhan@Shakils-MacBook-Pro cerberus % curl -H "Authorization: Bearer sha256~9TcSTiu9mc3TpMaKNB9IJBZFqK-EELgLmKL4e8_rytA" "https://c114-e.us-south.containers.cloud.ibm.com:31215/api/v1/nodes?limit=1" { "kind": "NodeList", "apiVersion": "v1", "metadata": { "resourceVersion": "3336359", "continue": "eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6MzMzNjM1OSwic3RhcnQiOiIxMC4zOC4yMDAuMTQ3XHUwMDAwIn0", "remainingItemCount": 6 }, "items": [ { "metadata": { "name": "10.38.200.147", "uid": "b83436da-066a-4a5a-bd63-826f62116919", "resourceVersion": "3335771", "creationTimestamp": "2022-06-10T22:07:40Z", "labels": { "arch": "amd64", "beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/instance-type": "c3c.16x32.encrypted", "beta.kubernetes.io/os": "linux", "failure-domain.beta.kubernetes.io/region": "us-south", "failure-domain.beta.kubernetes.io/zone": "dal10", "ibm-cloud.kubernetes.io/encrypted-docker-data": "true", "ibm-cloud.kubernetes.io/external-ip": "169.63.200.123", "ibm-cloud.kubernetes.io/iaas-provider": "softlayer", "ibm-cloud.kubernetes.io/internal-ip": "10.38.200.147", "ibm-cloud.kubernetes.io/machine-type": "c3c.16x32.encrypted", "ibm-cloud.kubernetes.io/os": "REDHAT_7_64", "ibm-cloud.kubernetes.io/region": "us-south", "ibm-cloud.kubernetes.io/sgx-enabled": "false", "ibm-cloud.kubernetes.io/worker-id": "kube-cahrog8d0id6oitvgmcg-itzroks6660-default-000003b4", "ibm-cloud.kubernetes.io/worker-pool-id": "cahrog8d0id6oitvgmcg-1f3be9f", "ibm-cloud.kubernetes.io/worker-pool-name": "default", "ibm-cloud.kubernetes.io/worker-version": "4.8.42_1559_openshift", "ibm-cloud.kubernetes.io/zone": "dal10", "kubernetes.io/arch": "amd64", "kubernetes.io/hostname": "10.38.200.147", "kubernetes.io/os": "linux", "node-role.kubernetes.io/master": "", "node-role.kubernetes.io/worker": "", "node.kubernetes.io/instance-type": "c3c.16x32.encrypted", "node.openshift.io/os_id": "rhel", "privateVLAN": "2972490", "publicVLAN": "2972492", "topology.kubernetes.io/region": "us-south", "topology.kubernetes.io/zone": "dal10" }, "annotations": { "projectcalico.org/IPv4Address": "10.38.200.147/26", "projectcalico.org/IPv4IPIPTunnelAddr": "172.30.22.0" }, "managedFields": [ { "manager": "kube-controller-manager",

thesimpledata commented 2 years ago

Also I found issues with the global variable use in the client.py. It does not work