hjacobs / kube-downscaler

Scale down Kubernetes deployments after work hours
https://hub.docker.com/r/hjacobs/kube-downscaler
GNU General Public License v3.0
528 stars 91 forks source link

Connection refused to API server (10.3.01) after rolling master nodes #58

Open hjacobs opened 5 years ago

hjacobs commented 5 years ago

We see some "connection refused" errors after rolling master nodes. To be investigated.

2019-06-05 09:16:53,927 INFO: Downscaler v0.14 started with debug=False, default_downtime=never, default_uptime=always, downscale_period=never, downtime_replicas=0, dry_run=False, exclude_deployments=kube-downscaler,downscaler,postgres-operator, exclude_namespaces=kube-system,visibility, exclude_statefulsets=, grace_period=900, interval=30, kind=['deployment', 'stack', 'deployment'], namespace=None, once=False, upscale_period=never
2019-06-05 09:18:55,888 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
     (self._dns_host, self.port), self.timeout, **extra_kw)
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
     raise err
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
     sock.connect(sa)
 ConnectionRefusedError: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
     chunked=chunked)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
     self._validate_conn(conn)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
     conn.connect()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
     conn = self._new_conn()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
     self, "Failed to establish a new connection: %s" % e)
 urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
     timeout=timeout
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
     _stacktrace=sys.exc_info()[2])
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
     raise MaxRetryError(_pool, url, error or ResponseError(cause))
 urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/kube_downscaler/main.py", line 41, in run_loop
     dry_run=dry_run, grace_period=grace_period, downtime_replicas=downtime_replicas)
   File "/kube_downscaler/scaler.py", line 159, in scale
     forced_uptime = pods_force_uptime(api, namespace)
   File "/kube_downscaler/scaler.py", line 29, in pods_force_uptime
     for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
     return iter(self.query_cache["objects"])
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
     cache["response"] = self.execute().json()
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 122, in execute
     r = self.api.get(**kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 267, in get
     return self.session.get(*args, **self.get_kwargs(**kwargs))
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
     return self.request('GET', url, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
     resp = self.send(prep, **send_kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
     r = adapter.send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 133, in send
     response = self._do_send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
     raise ConnectionError(e, request=request)
 requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
 2019-06-05 16:12:36,328 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))
 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
     (self._dns_host, self.port), self.timeout, **extra_kw)
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
     raise err
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
     sock.connect(sa)
 ConnectionRefusedError: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
     chunked=chunked)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
     self._validate_conn(conn)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
     conn.connect()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
     conn = self._new_conn()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
     self, "Failed to establish a new connection: %s" % e)
 urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
     timeout=timeout
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
     _stacktrace=sys.exc_info()[2])
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
     raise MaxRetryError(_pool, url, error or ResponseError(cause))
 urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/kube_downscaler/main.py", line 41, in run_loop
     dry_run=dry_run, grace_period=grace_period, downtime_replicas=downtime_replicas)
   File "/kube_downscaler/scaler.py", line 159, in scale
     forced_uptime = pods_force_uptime(api, namespace)
   File "/kube_downscaler/scaler.py", line 29, in pods_force_uptime
     for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
     return iter(self.query_cache["objects"])
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
     cache["response"] = self.execute().json()
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 122, in execute
     r = self.api.get(**kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 267, in get
     return self.session.get(*args, **self.get_kwargs(**kwargs))
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
     return self.request('GET', url, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
     resp = self.send(prep, **send_kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
     r = adapter.send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 133, in send
     response = self._do_send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
     raise ConnectionError(e, request=request)
 requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))
robertbeal commented 4 years ago

We're seeing this on k8 1.16.8, freshly installed (masters haven't been touched) following the readme instructions. I haven't been able to figure out the issue yet.

artyomtkachenko commented 4 years ago

Experienced the same issue with EKS 1.16 on Fargate. It seems it is something to do with the CNI initialization.

As a workaround I added a sleeper init container to wait for 10 seconds

      initContainers:
      - name: wait-for-network
        image: busybox:1.28
        command: ['sh', '-c', "sleep 10"]
        resources:
          limits:
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 100Mi