Closed Dinaamagdy closed 5 years ago
@muhamadazmy is it possible that the reboot was fast enough that minio didn't report the failed shard?
Minio can only detect the shards offline if it tries to write to it. if nothing is happening, minio doesn't know about offline shards.
@dinaamagdy did you try uploading data to Minio to see if it will create new shards/namespaces?
@serboctor I did ,and same behaviour when I try to ulpload file , it result no rout to host too
^[[16;1R[Mon26 12:39] - s3.py :174 :j.s3demo - ERROR - can't create bucket!
[Mon26 12:39] - s3.py :175 :j.s3demo - ERROR - HTTPConnectionPool(host='192.168.191.135', port=1024): Max retries exceeded with url: /bzusw7w7m3m4hd3y/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a1d0475c0>: Failed to establish a new connection: [Errno 113] No route to host',))
*** RuntimeError: Can't create bucket!
and self healing doesn't work
@Dinaamagdy
actually the no route to host exception should be catch cause the action _ensure_namespaces_connections
should never fail, if it fails to get the connection to a shard, this shards should be mark as down and be self-healed
@zaibon yea, this needs to be changed. But also, the selfhealing that Azmy did before should have also handled it, so that's a problem too.
verified on 71baf756acd8ddb3b26a0f1feb6a54554bf152b1
Steps
Expected result
zrobot monitor should detect the absence of this node then create new zdb , and when node back, should clean useless zdb to free space
Result
it result no rout to host through verify data backend namespace connections , and didn't create new zdb
Traceback (most recent call last): File "/opt/code/github/threefoldtech/0-robot/zerorobot/task/task.py", line 78, in execute self._result = self._func() File "/opt/code/github/threefoldtech/0-templates/templates/s3/s3.py", line 122, in _ensure_namespaces_connections namespaces.append(robot.services.get(template_uid=NS_TEMPLATEUID, name=namespace['name'])) File "/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py", line 117, in get results = self.find(**kwargs) File "/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py", line 97, in find services, = self._client.api.services.listServices(query_params=kwargs) File "/opt/code/github/threefoldtech/0-robot/JumpscaleZrobot/clients/zerorobot/client/services_service.py", line 186, in listServices resp = self.client.get(uri, None, headers, query_params, content_type) File "/opt/code/github/threefoldtech/0-robot/JumpscaleZrobot/clients/zerorobot/client/http_client.py", line 71, in get return self._handle_data(uri, data, headers, params, content_type, self.session.get) File "/opt/code/github/threefoldtech/0-robot/JumpscaleZrobot/clients/zerorobot/client/http_client.py", line 53, in _handle_data res = method(uri, headers=headers, params=params) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 525, in get return self.request('GET', url, kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 512, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 622, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 513, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.102.115.21', port=6600): Max retries exceeded with url: /services?name=3134836c-c828-48e5-b3cd-ef81cef19f66&template_uid=github.com%2Fthreefoldtech%2F0-templates%2Fnamespace%2F0.0.1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe1b4079390>: Failed to establish a new connection: [Errno 113] No route to host',))