openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

Remove node fails #1170

Closed JeffreyDevloo closed 7 years ago

JeffreyDevloo commented 7 years ago

Problem description

Node removal fails near the end while restarting the memcache services.

Log

This is but a fraction of the logging containing the errors

2016-11-17 12:00:44 36200 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 84 - INFO - Demoting node
2016-11-17 12:00:44 37900 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 85 - INFO - Leaving Arakoon ovsdb cluster
2016-11-17 12:00:54 28500 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 103 - INFO - Leaving Arakoon config cluster
2016-11-17 12:01:03 77500 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 120 - INFO - Update configurations
2016-11-17 12:01:03 78600 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 122 - INFO - Restarting master node services
2016-11-17 12:01:03 80400 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 123 - DEBUG - Removing/unconfiguring RabbitMQ
2016-11-17 12:01:04 63600 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 125 - ERROR - 
Failed to remove/unconfigure RabbitMQ
Traceback (most recent call last):
  File "ovs/lib/setup.py", line 1464, in _demote_node
    target_client.run(['rabbitmq-server', '-detached'])
  File "ovs/extensions/generic/sshclient.py", line 59, in inner_function
    return outer_function(self, *args, **kwargs)
  File "ovs/extensions/generic/sshclient.py", line 287, in run
    raise CalledProcessError(exit_code, command, output)
CalledProcessError: Command ''rabbitmq-server' '-detached'' returned non-zero exit status 1
2016-11-17 12:01:04 63600 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 126 - ERROR - Command ''rabbitmq-server' '-detached'' returned non-zero exit status 1
Traceback (most recent call last):
  File "ovs/lib/setup.py", line 1464, in _demote_node
    target_client.run(['rabbitmq-server', '-detached'])
  File "ovs/extensions/generic/sshclient.py", line 59, in inner_function
    return outer_function(self, *args, **kwargs)
  File "ovs/extensions/generic/sshclient.py", line 287, in run
    raise CalledProcessError(exit_code, command, output)
CalledProcessError: Command ''rabbitmq-server' '-detached'' returned non-zero exit status 1

2016-11-17 12:07:23 79800 +0100 - ovs-node-1 - 22724/140109023356672 - lib/ensure single - 1133 - INFO - Ensure single CHAINED mode - ID 1479380843_bggSbwvvg5 - Amount of
 jobs pending for key ovs_ensure_single_alba.checkup_maintenance_agents: 0
2016-11-17 12:07:24 93500 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1145 - DEBUG -   10.100.199.151  - Stopping service watcher-framework
2016-11-17 12:07:55 00600 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1146 - DEBUG -   10.100.199.151  - Service watcher-framework stopped
2016-11-17 12:07:55 43700 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1147 - DEBUG -   10.100.199.153  - Stopping service watcher-framework
2016-11-17 12:07:57 06100 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1148 - DEBUG -   10.100.199.153  - Service watcher-framework stopped
2016-11-17 12:07:58 00700 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1151 - DEBUG -   10.100.199.151  - Restarting service memcached
2016-11-17 12:07:58 04900 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1152 - DEBUG -   10.100.199.151  - Service memcached restarted
2016-11-17 12:07:58 15400 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1153 - DEBUG -   10.100.199.153  - Restarting service memcached
2016-11-17 12:07:58 33700 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1154 - DEBUG -   10.100.199.153  - Service memcached restarted
2016-11-17 12:07:58 50500 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1156 - INFO - 

2016-11-17 12:07:58 50600 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1157 - ERROR - An unexpected error occurred:
Traceback (most recent call last):
  File "ovs/lib/setup.py", line 760, in remove_node
    SetupController._restart_framework_and_memcache_services(master_ips, slave_ips, ip_client_map, offline_node_ips)
  File "ovs/lib/setup.py", line 1531, in _restart_framework_and_memcache_services
    Toolbox.change_service_state(clients[ip], memcached, 'restart', SetupController._logger)
  File "ovs/lib/helpers/toolbox.py", line 183, in change_service_state
    status, _ = ServiceManager.get_service_status(name, client=client)
  File "ovs/extensions/services/systemd.py", line 121, in get_service_status
    name = Systemd._get_name(name, client)
  File "ovs/extensions/services/systemd.py", line 52, in _get_name
    raise ValueError('Service {0} could not be found.'.format(name))
ValueError: Service ovs-memcached could not be found.
2016-11-17 12:07:58 50700 +0100 - ovs-node-1 - 22724/140109023356672 - lib/setup - 1158 - ERROR - Service ovs-memcached could not be found.
Traceback (most recent call last):
  File "ovs/lib/setup.py", line 760, in remove_node
    SetupController._restart_framework_and_memcache_services(master_ips, slave_ips, ip_client_map, offline_node_ips)
  File "ovs/lib/setup.py", line 1531, in _restart_framework_and_memcache_services
    Toolbox.change_service_state(clients[ip], memcached, 'restart', SetupController._logger)
  File "ovs/lib/helpers/toolbox.py", line 183, in change_service_state
    status, _ = ServiceManager.get_service_status(name, client=client)
  File "ovs/extensions/services/systemd.py", line 121, in get_service_status
    name = Systemd._get_name(name, client)
  File "ovs/extensions/services/systemd.py", line 52, in _get_name
    raise ValueError('Service {0} could not be found.'.format(

Full log: lib.log.zip

khenderick commented 7 years ago

Fixed by #1176, packaged in openvstorage-2.7.6-rev.4320.8c4fb99

JeffreyDevloo commented 7 years ago

Steps

Output

Removal was successful. Did not encounter the mentioned error.

Creating SSH connections to remaining master nodes
  Node with IP 10.100.199.153  successfully connected to
  Node with IP 10.100.199.151  successfully connected to
  Node with IP 10.100.199.152  successfully connected to
Are you sure you want to remove node ovs-node-2? (y/[n]): y
Do you also want to remove the ASD manager and related ASDs? (y/[n]): y
The removal of these StorageRouters brings data at risk on backend mybackend. Loosing more disks will cause data loss.
The removal of these StorageRouters brings data at risk on backend mybackend02. Loosing more disks will cause data loss.
Are you sure you want to continue? (y/[n]): y
Starting removal of node ovs-node-2 - 10.100.199.152
  Removing vPools from node
    Removing vPool myvpool02 from node
    Removing vPool myvpool01 from node

+++ Demoting node +++

Leaving Arakoon ovsdb cluster
Leaving Arakoon config cluster
Update configurations
Restarting master node services
Removing/unconfiguring RabbitMQ
  [10.100.199.152] rabbitmq-server stopped
  [10.100.199.152] rabbitmq-server already halted
Stopping services
Stopping service memcached
  [10.100.199.152] memcached stopped
Stopping service rabbitmq-server
  [10.100.199.152] rabbitmq-server already halted
Removing services
Removing service scheduled-tasks
  [10.100.199.152] scheduled-tasks stopped
Removing service webapp-api
  [10.100.199.152] webapp-api stopped
Removing service volumerouter-consumer
  [10.100.199.152] volumerouter-consumer stopped
Update existing vPools
Restarting services
  [10.100.199.153] watcher-framework stopped
  [10.100.199.151] watcher-framework stopped
  [10.100.199.152] watcher-framework stopped
  [10.100.199.153] memcached restarted
  [10.100.199.151] memcached restarted
  [10.100.199.153] watcher-framework started
  [10.100.199.151] watcher-framework started
  [10.100.199.152] watcher-framework started

+++ Running "demote" hooks +++

Executing storagedriver.on_demote
Executing albacontroller.on_demote
Restarting services
  [10.100.199.153] watcher-framework stopped
  [10.100.199.151] watcher-framework stopped
  [10.100.199.152] watcher-framework stopped
  [10.100.199.153] memcached restarted
  [10.100.199.151] memcached restarted
  [10.100.199.153] watcher-framework started
  [10.100.199.151] watcher-framework started
  [10.100.199.152] watcher-framework started
Avahi installed
Announcing service
  [10.100.199.152] avahi-daemon restarted

+++ Demote complete +++

Stopping and removing services
Removing services
Removing service workers
Removing service support-agent
Removing service watcher-framework
Removing service watcher-config
Stopping service rabbitmq-server
Stopping service memcached

+++ Running "remove" hooks +++

Executing storagedriver.on_remove
Executing albacontroller.on_remove
Removing node from model
  [10.100.199.153] watcher-framework stopped
  [10.100.199.151] watcher-framework stopped
  [10.100.199.153] memcached restarted
  [10.100.199.151] memcached restarted
  [10.100.199.153] watcher-framework started
  [10.100.199.151] watcher-framework started
Successfully removed node

Removing ASD Manager

+++ Remove nodes finished +++

Test result

Test passed.

Packages