openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

Maintenance agents cannot be reconfigured when node is down that runs such agent #569

Closed khenderick closed 8 years ago

khenderick commented 8 years ago

Maintenance agents cannot be reconfigured when node is down that runs such agent

khenderick commented 8 years ago

Resolved by #177.

JeffreyDevloo commented 8 years ago

Steps

Output

2016-09-19 16:08:07 51500 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 1 - INFO - Ensure single CHAINED mode - ID 1474294086_aszZ0yyMan - Amount of jobs pending for key ovs_ensure_single_alba.checkup_maintenanc
e_agents: 0
2016-09-19 16:08:07 51800 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 2 - INFO - Ensure single CHAINED mode - ID 1474294086_aszZ0yyMan - New task alba.checkup_maintenance_agents with default params scheduled f
or execution
2016-09-19 16:08:07 52000 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 3 - INFO - Ensure single CHAINED mode - ID 1474294086_aszZ0yyMan - Amount of jobs pending for key ovs_ensure_single_alba.checkup_maintenanc
e_agents: 1
2016-09-19 16:08:07 52000 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 4 - INFO - Ensure single CHAINED mode - ID 1474294086_aszZ0yyMan -   KWARGS: {}
2016-09-19 16:08:07 52400 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 5 - INFO - Loading maintenance information
2016-09-19 16:08:08 34700 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 6 - ERROR - * Cannot fetch maintenance information for 10.100.199.153
Traceback (most recent call last):
  File "ovs/lib/albacontroller.py", line 1495, in checkup_maintenance_agents
    service_names = node.client.list_maintenance_services()
  File "ovs/extensions/plugins/asdmanager.py", line 229, in list_maintenance_services
    return self._call(requests.get, 'maintenance')['services']
  File "ovs/extensions/plugins/asdmanager.py", line 75, in _call
    response = method(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 437, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='10.100.199.153', port=8500): Max retries exceeded with url: /maintenance (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f7ea9c662
50>: Failed to establish a new connection: [Errno 113] No route to host',))
2016-09-19 16:08:24 75400 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 7 - ERROR - * Cannot fetch maintenance information for 10.100.199.152
2016-09-19 16:08:27 62100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 11 - INFO - Generating service worklog for vm-backend3
2016-09-19 16:08:27 62300 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 12 - INFO - Applying service worklog for vm-backend3
2016-09-19 16:08:27 62300 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 13 - INFO - Finished service worklog for vm-backend3
2016-09-19 16:08:27 62600 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 14 - INFO - Generating service worklog for vm-backend2
2016-09-19 16:08:27 62700 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 15 - INFO - Applying service worklog for vm-backend2
2016-09-19 16:08:27 62700 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 16 - INFO - Finished service worklog for vm-backend2
2016-09-19 16:08:27 63000 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 17 - INFO - Generating service worklog for vm-backend
2016-09-19 16:08:27 63100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 18 - INFO - Applying service worklog for vm-backend
2016-09-19 16:08:27 63100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 19 - INFO - Finished service worklog for vm-backend
2016-09-19 16:08:27 63100 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 20 - INFO - Ensure single CHAINED mode - ID 1474294086_aszZ0yyMan - Task alba.checkup_maintenance_agents finished successfully
2016-09-19 16:08:27 63300 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 21 - INFO - Ensure single CHAINED mode - ID 1474294086_aszZ0yyMan - Amount of jobs pending for key ovs_ensure_single_alba.checkup_maintenance_agents: 0

After restarting the node that was down and checking the agents again:

2016-09-19 16:31:06 48200 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 22 - INFO - Ensure single CHAINED mode - ID 1474295466_bnerh7vw0W - Setting initial value for key ovs_ensure_single_alba.checkup_maintenance_agents
2016-09-19 16:31:06 48500 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 23 - INFO - Ensure single CHAINED mode - ID 1474295466_bnerh7vw0W - New task alba.checkup_maintenance_agents with default params scheduled for execution
2016-09-19 16:31:06 48700 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 24 - INFO - Ensure single CHAINED mode - ID 1474295466_bnerh7vw0W - Amount of jobs pending for key ovs_ensure_single_alba.checkup_maintenance_agents: 1
2016-09-19 16:31:06 48700 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 25 - INFO - Ensure single CHAINED mode - ID 1474295466_bnerh7vw0W -   KWARGS: {}
2016-09-19 16:31:06 49100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 26 - INFO - Loading maintenance information
2016-09-19 16:31:06 55100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 27 - DEBUG - * Maintenance 2FKhZHmMoEZsSKFJ for vm-backend on 10.100.199.153
2016-09-19 16:31:06 55100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 28 - DEBUG - * Maintenance 1pER8AaKqek7GN11 for vm-backend2 on 10.100.199.153
2016-09-19 16:31:06 55100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 29 - DEBUG - * Maintenance xTqxmZU4sVfJi64y for vm-backend3 on 10.100.199.153
2016-09-19 16:31:06 60200 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 30 - DEBUG - * Maintenance PHgPqf3ZNeKD1KdP for vm-backend on 10.100.199.152
2016-09-19 16:31:06 69800 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 31 - DEBUG - * Maintenance 7dxmQqwPJ2mMWSHx for vm-backend on 10.100.199.151
2016-09-19 16:31:06 75100 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 32 - INFO - Generating service worklog for vm-backend3
2016-09-19 16:31:06 75200 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 33 - DEBUG - * Candidates for removal (unused node): [u'alba-maintenance_vm-backend3-xTqxmZU4sVfJi64y'] on 10.100.199.153
2016-09-19 16:31:06 75200 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 34 - DEBUG - * Removing removal candidate (at least 1 service required): alba-maintenance_vm-backend3-xTqxmZU4sVfJi64y on 10.100.199.153
2016-09-19 16:31:06 75200 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 35 - INFO - Applying service worklog for vm-backend3
2016-09-19 16:31:06 75300 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 36 - INFO - Finished service worklog for vm-backend3
2016-09-19 16:31:06 75400 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 37 - INFO - Generating service worklog for vm-backend2
2016-09-19 16:31:06 75500 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 38 - DEBUG - * Candidates for removal (unused node): [u'alba-maintenance_vm-backend2-1pER8AaKqek7GN11'] on 10.100.199.153
2016-09-19 16:31:06 75500 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 39 - DEBUG - * Removing removal candidate (at least 1 service required): alba-maintenance_vm-backend2-1pER8AaKqek7GN11 on 10.100.199.153
2016-09-19 16:31:06 75500 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 40 - INFO - Applying service worklog for vm-backend2
2016-09-19 16:31:06 75500 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 41 - INFO - Finished service worklog for vm-backend2
2016-09-19 16:31:06 75600 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 42 - INFO - Generating service worklog for vm-backend
2016-09-19 16:31:06 75800 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 43 - INFO - Applying service worklog for vm-backend
2016-09-19 16:31:06 75800 +0200 - ovs-node2 - 30860/140182355441472 - lib/alba - 44 - INFO - Finished service worklog for vm-backend
2016-09-19 16:31:06 75800 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 45 - INFO - Ensure single CHAINED mode - ID 1474295466_bnerh7vw0W - Task alba.checkup_maintenance_agents finished successfully
2016-09-19 16:31:06 76000 +0200 - ovs-node2 - 30860/140182355441472 - lib/scheduled tasks - 46 - INFO - Ensure single CHAINED mode - ID 1474295466_bnerh7vw0W - Amount of jobs pending for key ovs_ensure_single_alba.checkup_maintenance_agents: 0

We see that the candidates for removal that were added during the downtime are removed again.

Test result

Test passed.

Setup

Hyperconverged setup

Package information