openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

mark_node_offline resulting in ClusterNotReachableException during node removal #813

Closed JeffreyDevloo closed 7 years ago

JeffreyDevloo commented 8 years ago

Problem description

While running ovs remove nodes I found the following error:

2016-08-12 11:33:03 11000 +0200 - ovs-node1 - 5060/139693196896064 - lib/setup - 18 - INFO - Starting removal of nodes
2016-08-12 11:33:03 11000 +0200 - ovs-node1 - 5060/139693196896064 - lib/setup - 19 - INFO -   Marking all Storage Drivers served by Storage Router 10.100.199.153 as offline
2016-08-12 11:33:03 12500 +0200 - ovs-node1 - 5060/139693196896064 - lib/setup - 20 - INFO - 

2016-08-12 11:33:03 12500 +0200 - ovs-node1 - 5060/139693196896064 - lib/setup - 21 - ERROR - An unexpected error occurred:
Traceback (most recent call last):
  File "ovs/lib/setup.py", line 665, in remove_nodes
    StorageDriverController.mark_offline(storagerouter_guid=storage_router.guid)
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 167, in <lambda>
    __call__ = lambda x, *a, **kw: x._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "ovs/lib/storagedriver.py", line 63, in mark_offline
    storagedriver_client.mark_node_offline(str(storagedriver.storagedriver_id))
ClusterNotReachableException:  
2016-08-12 11:33:03 12600 +0200 - ovs-node1 - 5060/139693196896064 - lib/setup - 22 - ERROR -  
Traceback (most recent call last):
  File "ovs/lib/setup.py", line 665, in remove_nodes
    StorageDriverController.mark_offline(storagerouter_guid=storage_router.guid)
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 167, in <lambda>
    __call__ = lambda x, *a, **kw: x._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "ovs/lib/storagedriver.py", line 63, in mark_offline
    storagedriver_client.mark_node_offline(str(storagedriver.storagedriver_id))
ClusterNotReachableException:  

Possible root of the problem

The cluster might not have been initialized when I ran ovs remove nodes.

Possible solution

Check if the cluster is initialized during the 'Mark offline' process of remove_node

Temporary solution

Wait till the cluster is initialized.

Additional information

Setup

Hyperconverged setup

wimpers commented 8 years ago

@JeffreyDevloo do you mean that the ovs setup was not yet finished when you tried to remove the node? Did the ovs setup finish correctly? Was the cluster usable?

JeffreyDevloo commented 8 years ago

The issue was that I was too impatient with my environment. When the servers were up I instantly invoked 'ovs remove node'. The setup finished flawlessly. The cluster was still usable after I tried the removal and after I got the error.

kvanhijf commented 7 years ago

https://github.com/openvstorage/framework/pull/1078 --> openvstorage-2.7.4-rev.4193.aa0ed1c https://github.com/openvstorage/alba-asdmanager/pull/131 --> openvstorage-sdm-1.6.4-rev.394.22c93f2 https://github.com/openvstorage/framework-alba-plugin/pull/250 --> openvstorage-backend-1.7.4-rev.784.647dd17

wimpers commented 7 years ago

Closed by PM.