openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

API storagerouter.markNodeOffline fails in case of multiple storagedrivers(vpools) #961

Closed saelbrec closed 7 years ago

saelbrec commented 7 years ago

Error from ovs-workers

2016-09-27 15:56:02 14900 +0200 - stor-201.be-gen8-1 - 12724/139888056657728 - celery/celery.worker.job - 2046 - INFO - Task ovs.storagerouter.ping[caf4857c-6186-4773-8061-2d6d0a2b184b] succeeded in 0.0282033559924s: None
2016-09-27 15:56:17 39500 +0200 - stor-201.be-gen8-1 - 12724/139888056657728 - celery/celery.worker.job - 2047 - ERROR - Task ovs.storagedriver.mark_offline[c04b421b-ad06-4d01-94bf-1e55e91363cc] raised unexpected: RuntimeError('got fault response markNodeOffline',)
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/storagedriver.py", line 62, in mark_offline
    storagedriver_client.mark_node_offline(str(storagedriver.storagedriver_id))
RuntimeError: got fault response markNodeOffline
2016-09-27 15:57:01 69700 +0200 - stor-201.be-gen8-1 - 12724/139888056657728 - celery/celery.worker.strategy - 2048 - INFO - Received task: ovs.storagerouter.ping[09bda7be-971f-4c00-a07a-ce1e3645a9f5]

Apparently the markOfflineNode calls for the different storagedrivers end up on the same vpool although they belong to a different one. Resulting in the following logs and error in the volumedriver

2016-09-27 15:56:17 365979 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7bbd7fa700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000ab97 - info - execute: Arguments for markNodeOffline are {[vrouter_cluster_id:f6a64b38-8785-4836-a330-1f9db328155c,vrouter_id:vmstorJAqIpFJHQnJ0HrWg]}
2016-09-27 15:56:17 366419 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7bbd7fa700 - volumedriverfs/ClusterRegistry - 000000000000ab98 - info - operator(): Updating state of node vmstorJAqIpFJHQnJ0HrWg from Online to Offline
2016-09-27 15:56:17 369471 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7bbd7fa700 - volumedriverfs/LockedArakoon - 000000000000ab99 - info - run_sequence: set node state succeeded after 1 attempt(s)
2016-09-27 15:56:17 369532 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7bbd7fa700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000ab9a - info - execute: Call markNodeOffline took 0.003488 seconds
2016-09-27 15:56:17 369614 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7bbd7fa700 - volumedriverfs/XMLRPCConnection - 000000000000ab9b - info - operator(): Closing socket 139
2016-09-27 15:56:17 371104 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7c0488d700 - volumedriverfs/MTServer - 000000000000ab9c - info - work: You have 2 connections running
2016-09-27 15:56:17 371157 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7c0488d700 - volumedriverfs/MTServer - 000000000000ab9d - info - cleanup_threads_locked: Trying to join 2 threads
2016-09-27 15:56:17 371208 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7c0932e700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000ab9e - info - execute: Arguments for markNodeOffline are {[vrouter_cluster_id:f6a64b38-8785-4836-a330-1f9db328155c,vrouter_id:data01JAqIpFJHQnJ0HrWg]}
2016-09-27 15:56:17 371443 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7c0932e700 - volumedriverfs/ClusterRegistry - 000000000000ab9f - error - find_node_throw_: Node data01JAqIpFJHQnJ0HrWg not registered in cluster f6a64b38-8785-4836-a330-1f9db328155c
2016-09-27 15:56:17 371540 +0200 - stor-201.be-gen8-1 - 11334/0x00007f7c0932e700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000aba0 - error - execute: markNodeOffline Caught fungi::IOException: node not registered data01JAqIpFJHQnJ0HrWg
khenderick commented 7 years ago

Fixed by #962, packaged in openvstorage-2.7.3-rev.4029.63e298f

JeffreyDevloo commented 7 years ago

Steps

Output

Only expecting logging on the remaining online nodes: On node 2: /var/log/upstart/ovs-volumedriver_myvpool02.log

2016-10-18 14:40:36 581307 +0200 - ovs-node2 - 8773/0x00007f6d897fa700 - volumedriverfs/XMLRPCTimingWrapper - 0000000000000197 - info - execute: Arguments for volumeDriverPerformanceCounters are {[reset:false,vrouter_cluster_id:96943cb9-8eb8-4b95-a078-ac372ef78880,vrouter_id:myvpool02vVZrpKNCRNhCUt6x]}
2016-10-18 14:40:36 581377 +0200 - ovs-node2 - 8773/0x00007f6d897fa700 - volumedriverfs/XMLRPCRedirectWrapper - 0000000000000198 - info - execute: Execution on myvpool02vVZrpKNCRNhCUt6x requested
2016-10-18 14:40:36 581401 +0200 - ovs-node2 - 8773/0x00007f6d897fa700 - volumedriverfs/XMLRPCRedirectWrapper - 0000000000000199 - info - execute: Node ID myvpool02vVZrpKNCRNhCUt6x is ours - good
2016-10-18 14:40:36 581469 +0200 - ovs-node2 - 8773/0x00007f6d897fa700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000019a - info - execute: Call volumeDriverPerformanceCounters took 0.000102 seconds

/var/log/upstart/ovs-volumedriver_myvpool01.log

2016-10-18 14:40:39 809732 +0200 - ovs-node2 - 7561/0x00007f274b7fe700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000017b - info - execute: Arguments for markNodeOffline are {[vrouter_cluster_id:f12fde62-2c3e-43b0-9fff-c2737e5dd3d4,vrouter_id:myvpool01xp3PGW3kCmcnn6Oe]}
2016-10-18 14:40:39 810220 +0200 - ovs-node2 - 7561/0x00007f274b7fe700 - volumedriverfs/ClusterRegistry - 000000000000017c - info - operator(): Updating state of node myvpool01xp3PGW3kCmcnn6Oe from Online to Offline
2016-10-18 14:40:39 812822 +0200 - ovs-node2 - 7561/0x00007f274b7fe700 - volumedriverfs/LockedArakoon - 000000000000017d - info - run_sequence: set node state succeeded after 1 attempt(s)
2016-10-18 14:40:39 812886 +0200 - ovs-node2 - 7561/0x00007f274b7fe700 - volumedriverfs/XMLRPCTimingWrapper - 000000000000017e - info - execute: Call markNodeOffline took 0.003088 seconds

Test result

Both SDM's were informed about being marked offline and the logging was found in the appropriate files. Test passed.

Packages