slipstream / SlipStreamClient

SlipStream Python client
Apache License 2.0
1 stars 4 forks source link

slipstream-node doesn't wait for network and fail #347

Closed 0xbase12 closed 6 years ago

0xbase12 commented 7 years ago

If we reboot a VM, the slipstream node executor will fail because he doesn't wait the network to be ready with routable IP and working dns client. This could also occur while trying to scale a VM vertically.

This make following functional test deployment of NuvlaBox firmware to fail after a reboot of a VM with following stack:

: 2017-09-04T14:20:32Z : Contacting the server with GET, at: https://nuv.la/run/ac83d789-7afa-43de-a05b-f5b839c57f5a/ss:state?ignoreabort=true

...skipping...
  File "/opt/slipstream/client/sbin/slipstream-node", line 124, in doWork
    node.execute()
  File "/opt/slipstream/client/lib/slipstream/executors/Machine.py", line 39, in execute
    self._publish_abort_and_fail("Machine executor creation failed", ex)
  File "/opt/slipstream/client/lib/slipstream/executors/Machine.py", line 45, in _publish_abort_and_fail
    AbortExceptionPublisher(self.configHolder).publish(message, sys.exc_info())
  File "/opt/slipstream/client/lib/slipstream/executors/Machine.py", line 80, in publish
    self._publish_abort(msg)
  File "/opt/slipstream/client/lib/slipstream/executors/Machine.py", line 85, in _publish_abort
    self.ss_client.setRuntimeParameter(abort, message)
  File "/opt/slipstream/client/lib/slipstream/SlipStreamHttpClient.py", line 229, in setRuntimeParameter
    accept='text/plain')
  File "/opt/slipstream/client/lib/slipstream/SlipStreamHttpClient.py", line 245, in _httpPut
    return self.httpClient.put(url, body, contentType, accept, retry=self.retry)
  File "/opt/slipstream/client/lib/slipstream/HttpClient.py", line 140, in put
    resp = self._call(url, 'PUT', body, contentType, accept, retry=retry)
  File "/opt/slipstream/client/lib/slipstream/HttpClient.py", line 279, in _call
    resp = _request(headers)
  File "/opt/slipstream/client/lib/slipstream/HttpClient.py", line 230, in _request
    verify=self.host_cert_verify)
  File "/opt/slipstream/client/lib/slipstream/HttpClient.py", line 57, in request
    response = super(SessionStore, self).request(*args, **kwargs)
  File "/opt/slipstream/client/lib/requests/sessions.py", line 502, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/slipstream/client/lib/requests/sessions.py", line 612, in send
    r = adapter.send(request, **kwargs)
  File "/opt/slipstream/client/lib/requests/adapters.py", line 504, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='nuv.la', port=443): Max retries exceeded with url: /run/ac83d789-7afa-43de-a05b-f5b839c57f5a/ss:abort?ignoreabort=t
rue (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x1f58d50>: Failed to establish a new connection: [Errno -2] Name or
service not known',))

To fix this issue, we can update the systemd service definition file /etc/systemd/system/slipstream-node.service to add in after section network-online.target which fix the issue. (cf. https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/)