threefoldtecharchive / 0-templates

0-robot templates
Apache License 2.0
1 stars 1 forks source link

S3:reboot a node with data shards #239

Closed Dinaamagdy closed 5 years ago

Dinaamagdy commented 5 years ago

Steps

Traceback (most recent call last): File "/opt/code/github/threefoldtech/0-robot/zerorobot/task/task.py", line 78, in execute self._result = self._func() File "/opt/code/github/threefoldtech/0-templates/templates/s3/s3.py", line 122, in _ensure_namespaces_connections namespaces.append(robot.services.get(template_uid=NS_TEMPLATEUID, name=namespace['name'])) File "/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py", line 117, in get results = self.find(**kwargs) File "/opt/code/github/threefoldtech/0-robot/zerorobot/dsl/ZeroRobotManager.py", line 97, in find services, = self._client.api.services.listServices(query_params=kwargs) File "/opt/code/github/threefoldtech/0-robot/JumpscaleZrobot/clients/zerorobot/client/services_service.py", line 186, in listServices resp = self.client.get(uri, None, headers, query_params, content_type) File "/opt/code/github/threefoldtech/0-robot/JumpscaleZrobot/clients/zerorobot/client/http_client.py", line 71, in get return self._handle_data(uri, data, headers, params, content_type, self.session.get) File "/opt/code/github/threefoldtech/0-robot/JumpscaleZrobot/clients/zerorobot/client/http_client.py", line 53, in _handle_data res = method(uri, headers=headers, params=params) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 525, in get return self.request('GET', url, kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 512, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 622, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 513, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.102.115.21', port=6600): Max retries exceeded with url: /services?name=3134836c-c828-48e5-b3cd-ef81cef19f66&template_uid=github.com%2Fthreefoldtech%2F0-templates%2Fnamespace%2F0.0.1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe1b4079390>: Failed to establish a new connection: [Errno 113] No route to host',))

serboctor commented 5 years ago

@muhamadazmy is it possible that the reboot was fast enough that minio didn't report the failed shard?

zaibon commented 5 years ago

Minio can only detect the shards offline if it tries to write to it. if nothing is happening, minio doesn't know about offline shards.

serboctor commented 5 years ago

@dinaamagdy did you try uploading data to Minio to see if it will create new shards/namespaces?

Dinaamagdy commented 5 years ago

@serboctor I did ,and same behaviour when I try to ulpload file , it result no rout to host too

^[[16;1R[Mon26 12:39] - s3.py             :174 :j.s3demo             - ERROR    - can't create bucket!
[Mon26 12:39] - s3.py             :175 :j.s3demo             - ERROR    - HTTPConnectionPool(host='192.168.191.135', port=1024): Max retries exceeded with url: /bzusw7w7m3m4hd3y/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a1d0475c0>: Failed to establish a new connection: [Errno 113] No route to host',))
*** RuntimeError: Can't create bucket!

and self healing doesn't work

muhamadazmy commented 5 years ago

@Dinaamagdy

zaibon commented 5 years ago

actually the no route to host exception should be catch cause the action _ensure_namespaces_connections should never fail, if it fails to get the connection to a shard, this shards should be mark as down and be self-healed

serboctor commented 5 years ago

@zaibon yea, this needs to be changed. But also, the selfhealing that Azmy did before should have also handled it, so that's a problem too.

Dinaamagdy commented 5 years ago

verified on 71baf756acd8ddb3b26a0f1feb6a54554bf152b1