Open ovstmoens opened 6 years ago
I don't know if this is a side effect from previous attempt to remove the node or not, but I've noticed there was a vPool running only on this particular node. After deleting the vPool the demote fails again further along the process with a different RuntimeError.
Output:
No handlers could be found for logger "ovs_extensions.db.arakoon.pyrakoon.client"
+++++++++++++++++++++
+++ Remove node +++
+++++++++++++++++++++
WARNING: Some of these steps may take a very long time, please check the logs for more information
Creating SSH connections to remaining master nodes
* Node with IP 10.100.199.211 - Successfully connected
* Node with IP 10.100.199.212 - Unable to connect
* Node with IP 10.100.199.213 - Successfully connected
* Node with IP 10.100.199.214 - Successfully connected
+++ Running "noderemoval - validate_removal" hooks +++
Executing alba._validate_removal
Certain nodes appear to be offline. These will not fully removed and will cause issues if they are not really offline.
Offline nodes:
* 10.100.199.212 - Unable to connect.
Continue the removal with these being presumably offline? (y/[n]): y
Are you sure you want to remove node DEV-4N-199-212? (y/[n]): y
The removal of this StorageRouter will cause data loss on Backend mybackend
The removal of this StorageRouter will cause data loss on Backend mybackend02
Are you sure you want to continue? (y/[n]): y
Starting removal of node DEV-4N-199-212 - 10.100.199.212
Marking all Storage Drivers served by Storage Router 10.100.199.212 as offline
Removing vPools from node
+++ Demoting node +++
Leaving Arakoon ovsdb cluster
Leaving Arakoon config cluster
Update configurations
Restarting master node services
Removing/unconfiguring offline RabbitMQ node
Update existing vPools
Restarting services
[10.100.199.211] watcher-framework stopped
[10.100.199.213] watcher-framework stopped
[10.100.199.214] watcher-framework stopped
[10.100.199.211] memcached restarted
[10.100.199.213] memcached restarted
[10.100.199.211] watcher-framework started
[10.100.199.211] support-agent restarted
[10.100.199.213] watcher-framework started
[10.100.199.213] support-agent restarted
[10.100.199.214] watcher-framework started
[10.100.199.214] support-agent restarted
+++ Running "nodetype - demote" hooks +++
Executing alba._on_demote
+++++++++++++++++++++++++++++++++++++++
+++ An unexpected error occurred: +++
+++ Config not yet loaded +++
+++++++++++++++++++++++++++++++++++++++
/var/log/ovs/lib.log:
2017-12-20 12:11:09 25300 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 50 - INFO - Remove node
2017-12-20 12:11:09 25400 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 51 - INFO - WARNING: Some of these steps may take a very long time, please check the logs for more information
2017-12-20 12:11:27 80200 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 67 - INFO - Creating SSH connections to remaining master nodes
2017-12-20 12:11:27 82500 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 68 - INFO - * Node with IP 10.100.199.211 - Successfully connected
2017-12-20 12:11:27 82700 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 69 - INFO - * Node with IP 10.100.199.212 - Unable to connect
2017-12-20 12:11:28 08000 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 70 - INFO - * Node with IP 10.100.199.213 - Successfully connected
2017-12-20 12:11:28 31100 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 71 - INFO - * Node with IP 10.100.199.214 - Successfully connected
2017-12-20 12:12:21 29400 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 117 - INFO - Running "noderemoval - validate_removal" hooks
2017-12-20 12:12:21 29400 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 118 - INFO - Executing alba._validate_removal
2017-12-20 12:12:21 89700 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 119 - INFO - Certain nodes appear to be offline. These will not fully removed and will cause issues if they are not really offline.
2017-12-20 12:12:21 89700 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 120 - INFO - Offline nodes:
* 10.100.199.212 - Unable to connect.
2017-12-20 12:13:11 55800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 138 - INFO - Starting removal of node DEV-4N-199-212 - 10.100.199.212
2017-12-20 12:13:11 55800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 139 - INFO - Marking all Storage Drivers served by Storage Router 10.100.199.212 as offline
2017-12-20 12:13:11 56100 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 140 - INFO - Removing vPools from node
2017-12-20 12:13:11 56200 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 141 - INFO - Demoting node
2017-12-20 12:13:21 37600 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 152 - INFO - Leaving Arakoon ovsdb cluster
2017-12-20 12:13:42 47400 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 188 - INFO - Leaving Arakoon config cluster
2017-12-20 12:13:48 28500 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 209 - INFO - Update configurations
2017-12-20 12:13:54 54600 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 220 - INFO - Restarting master node services
2017-12-20 12:13:54 61100 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 221 - INFO - Removing/unconfiguring offline RabbitMQ node
2017-12-20 12:13:55 37300 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 222 - INFO - Update existing vPools
2017-12-20 12:13:55 40800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 223 - INFO - Applying local storagedriver configuration changes
2017-12-20 12:13:57 53700 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 224 - INFO - Changed events_amqp_uris from "[amqp://ovs:0penv5tor4ge@10.100.199.212:5672, amqp://ovs:0penv5tor4ge@10.100.199.211:5672, amqp://ovs:0penv5tor4ge@10.100.199.213:5672]" to "[amqp://ovs:0penv5tor4ge@10.100.199.211:5672, amqp://ovs:0penv5tor4ge@10.100.199.213:5672]"
2017-12-20 12:13:57 53800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 225 - INFO - Changes applied
2017-12-20 12:13:57 57400 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 226 - INFO - Applying local storagedriver configuration changes
2017-12-20 12:14:00 53000 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 227 - INFO - Changed events_amqp_uris from "[amqp://ovs:0penv5tor4ge@10.100.199.212:5672, amqp://ovs:0penv5tor4ge@10.100.199.211:5672, amqp://ovs:0penv5tor4ge@10.100.199.213:5672]" to "[amqp://ovs:0penv5tor4ge@10.100.199.211:5672, amqp://ovs:0penv5tor4ge@10.100.199.213:5672]"
2017-12-20 12:14:00 53100 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 228 - INFO - Changes applied
2017-12-20 12:14:00 54100 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 229 - INFO - Applying local storagedriver configuration changes
2017-12-20 12:14:02 78500 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 230 - INFO - Changed events_amqp_uris from "[amqp://ovs:0penv5tor4ge@10.100.199.212:5672, amqp://ovs:0penv5tor4ge@10.100.199.211:5672, amqp://ovs:0penv5tor4ge@10.100.199.213:5672]" to "[amqp://ovs:0penv5tor4ge@10.100.199.211:5672, amqp://ovs:0penv5tor4ge@10.100.199.213:5672]"
2017-12-20 12:14:02 78600 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/storagedriver - 231 - INFO - Changes applied
2017-12-20 12:14:02 79100 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 232 - INFO - Restarting services
2017-12-20 12:14:19 81700 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 255 - INFO - Running "nodetype - demote" hooks
2017-12-20 12:14:19 81700 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-type - 256 - INFO - Executing alba._on_demote
2017-12-20 12:14:19 85600 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/alba - 257 - INFO - Shrinking ABM for Backend mybackend
2017-12-20 12:14:19 87000 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/alba - 258 - INFO - * Shrink ABM cluster
2017-12-20 12:14:23 58800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/alba - 279 - INFO - * Updating ABM client config
2017-12-20 12:14:24 02800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/alba - 281 - INFO - * Remove old ABM node from model
2017-12-20 12:14:24 04800 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/alba - 282 - INFO - Shrinking NSM for Backend mybackend
2017-12-20 12:14:24 06200 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/alba - 283 - INFO - * Shrink NSM cluster mybackend-nsm_0
2017-12-20 12:14:24 06200 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 284 - INFO -
2017-12-20 12:14:24 06300 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 285 - ERROR - An unexpected error occurred:
Traceback (most recent call last):
File "ovs/lib/noderemoval.py", line 224, in remove_node
offline_nodes=storage_routers_offline)
File "ovs/lib/nodetype.py", line 501, in demote_node
offline_node_ips=offline_node_ips):
File "ovs/lib/helpers/toolbox.py", line 104, in run_hooks
fct(**kwargs)
File "ovs/lib/helpers/..//alba.py", line 659, in _on_demote
offline_nodes=offline_node_ips)
File "/usr/lib/python2.7/dist-packages/ovs_extensions/db/arakoon/arakooninstaller.py", line 588, in shrink_cluster
raise RuntimeError('Config not yet loaded')
RuntimeError: Config not yet loaded
2017-12-20 12:14:24 06300 +0100 - DEV-4N-199-211 - 12259/140046373598976 - lib/node-removal - 286 - ERROR - Config not yet loaded
Traceback (most recent call last):
File "ovs/lib/noderemoval.py", line 224, in remove_node
offline_nodes=storage_routers_offline)
File "ovs/lib/nodetype.py", line 501, in demote_node
offline_node_ips=offline_node_ips):
File "ovs/lib/helpers/toolbox.py", line 104, in run_hooks
fct(**kwargs)
File "ovs/lib/helpers/..//alba.py", line 659, in _on_demote
offline_nodes=offline_node_ips)
File "/usr/lib/python2.7/dist-packages/ovs_extensions/db/arakoon/arakooninstaller.py", line 588, in shrink_cluster
raise RuntimeError('Config not yet loaded')
RuntimeError: Config not yet loaded
The node where this issue occurred was the only node for vPool "myvpool02" which probably caused this error, on other nodes in this environment the issue did not occur. It also doesn't occur when you remove the vPool manually first.
vPools running on a single node aren't exactly a real life use case so maybe it's not worth the time to make it possible but perhaps we could make the output a bit clearer.
The node where this issue occurred was the only node for vPool "myvpool02" which probably caused this error, on other nodes in this environment the issue did not occur. It also doesn't occur when you remove the vPool manually first.
I got 'similar' issue even if I removed the vPool manually first. I didn't keep the trackback, so that's why i say similar here.
On DEV-4N-199-211 environment I was trying to reproduce the error mentioned in https://github.com/openvstorage/operations/issues/25.
After powering off one of the nodes, I've marked the node offline in the GUI and tried to remove the node from the cluster via CLI:
/var/log/ovs/lib.log shows the following information: