uber / fiber

Distributed Computing for AI Made Simple
https://uber.github.io/fiber/
Apache License 2.0
1.04k stars 110 forks source link

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded (Caused by NewConnectionErr or('<urllib3.connection.HTTPConnection object at 0x7fb3be7bca58>: Failed to establish a new connection: [Errno 111] Connection refused',)) #45

Closed kiran-italiya closed 3 years ago

kiran-italiya commented 3 years ago

Getting this error when executing any fiber function call. Tried giving all permissions and double checked network configuration but everything seems right. As this is the broad error and this is relatively new library couldn't find solution anywhere else. I'm new to this so please point me in the right direction.

Logs look like this:

Feb 22 18:48:03 test1-797b9fdffb-jz225: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCon
nection object at 0x7fb3be7bc710>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225
Feb 22 18:48:03 test1-797b9fdffb-jz225: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCon
nection object at 0x7fb3be7bc860>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225
Feb 22 18:48:03 test1-797b9fdffb-jz225: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCon
nection object at 0x7fb3be7bc940>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225
something started
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.6/http/client.py", line 1287, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1333, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1282, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
 File "/usr/local/lib/python3.6/http/client.py", line 1042, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.6/http/client.py", line 980, in send
    self.connect()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb3be7bca58>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/main.py", line 55, in <module>
    sharedQue = SimpleQueue()
  File "/usr/local/lib/python3.6/site-packages/fiber/queues.py", line 295, in __init__
    backend = get_backend()
  File "/usr/local/lib/python3.6/site-packages/fiber/backend.py", line 74, in get_backend
    name)).Backend(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/fiber/kubernetes_backend.py", line 64, in __init__
    self.default_namespace)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 22785, in read_namespaced_pod
    return self.read_namespaced_pod_with_http_info(name, namespace, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 22894, in read_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 243, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 216, in request
    headers=headers)
  File "/usr/local/lib/python3.6/site-packages/urllib3/request.py", line 75, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/usr/local/lib/python3.6/site-packages/urllib3/request.py", line 96, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/usr/local/lib/python3.6/site-packages/urllib3/poolmanager.py", line 375, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
    **response_kw
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
    **response_kw
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
    **response_kw
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 573, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb3be7bca58>: Failed to establish a new connection: [Errno 111] Connection refused',))

Also I was getting POST method not supported (501) error when trying to run with fiber cli. It also fails. Don't know whether it's a bug.

calio commented 3 years ago

Could you try pip install kubernetes==10.0.1 when building your Docker image? It seems that newer version of kubernetes python client doesn't work with Fiber. For the Fiber CLI issue, can you open another Github issue with detailed logs so that we can track it over there?

kiran-italiya commented 3 years ago

Thanks a lot man. It is finally working now. However it's worth mentioning in readme that this is the case.