openvstorage / openvstorage-health-check

The health check is classified as a monitoring and detection tool for Open vStorage.
3 stars 7 forks source link

arakoon cluster healthcheck AttributeError NoneType #145

Closed jeroenmaelbrancke closed 7 years ago

jeroenmaelbrancke commented 7 years ago

Problem description

unattended:

root@perf-roub-01:~# ovs healthcheck unattended
Traceback (most recent call last):
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 35, in <module>
    class HealthCheckController(object):
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 414, in HealthCheckController
    HealthCheckController.run_method(*arguments)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 359, in run_method
    return HealthCheckController.check_unattended()
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 54, in check_unattended
    return HealthCheckController.execute_check(unattended, silent_mode)
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 109, in execute_check
    HealthCheckController.check_arakoon(logger)
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 147, in check_arakoon
    ArakoonHealthCheck.run(logger)
  File "/opt/OpenvStorage/ovs/extensions/healthcheck/arakoon/arakooncluster_health_check.py", line 436, in run
    ArakoonHealthCheck.check_arakoons(logger)
  File "/opt/OpenvStorage/ovs/extensions/healthcheck/arakoon/arakooncluster_health_check.py", line 414, in check_arakoons
    arakoon_clusters = ArakoonHealthCheck.fetch_available_clusters(logger)
  File "/opt/OpenvStorage/ovs/extensions/healthcheck/arakoon/arakooncluster_health_check.py", line 117, in fetch_available_clusters
    'hostname': node_info.name,
AttributeError: 'NoneType' object has no attribute 'name'

attended:

[INFO] Checking port 26405 of service flash_roub_nsm_3-uZTdLYaVJbhNHLT0 ...
[SUCCESS] Connection successfully established!
[INFO] Fetching available arakoon clusters: 
Traceback (most recent call last):
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 35, in <module>
    class HealthCheckController(object):
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 414, in HealthCheckController
    HealthCheckController.run_method(*arguments)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 363, in run_method
    return HealthCheckController.check_attended()
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 70, in check_attended
    return HealthCheckController.execute_check(unattended, silent_mode)
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 109, in execute_check
    HealthCheckController.check_arakoon(logger)
  File "/usr/lib/python2.7/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/healthcheck.py", line 147, in check_arakoon
    ArakoonHealthCheck.run(logger)
  File "/opt/OpenvStorage/ovs/extensions/healthcheck/arakoon/arakooncluster_health_check.py", line 436, in run
    ArakoonHealthCheck.check_arakoons(logger)
  File "/opt/OpenvStorage/ovs/extensions/healthcheck/arakoon/arakooncluster_health_check.py", line 414, in check_arakoons
    arakoon_clusters = ArakoonHealthCheck.fetch_available_clusters(logger)
  File "/opt/OpenvStorage/ovs/extensions/healthcheck/arakoon/arakooncluster_health_check.py", line 117, in fetch_available_clusters
    'hostname': node_info.name,
AttributeError: 'NoneType' object has no attribute 'name'

Possible root of the problem

In [13]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:from ovs.extensions.generic.configuration import Configuration
:from ovs.dal.lists.storagerouterlist import StorageRouterList
:
:arakoon_clusters = list(Configuration.list('/ovs/arakoon'))
:for cluster in arakoon_clusters:
: ak = ArakoonClusterConfig(str(cluster), filesystem=False)
: ak.load_config()
: master_node_ids = [node.name for node in ak.nodes]
: for node_id in master_node_ids:
:  node_info = StoragerouterHelper.get_by_machine_id(node_id)
:  if node_info is None:
:   print 'No storagerouterinfo for {}'.format(node_id)
:--
No storagerouterinfo for p5ol09r9cnHriw3N
No storagerouterinfo for p5ol09r9cnHriw3N
No storagerouterinfo for p5ol09r9cnHriw3N
No storagerouterinfo for p5ol09r9cnHriw3N

Additional information

Setup

Packages

openvstorage                         2.7.5-fargo.4-1                 amd64        openvStorage
openvstorage-backend                 1.7.5-fargo.2-1                 amd64        openvStorage Backend plugin
openvstorage-backend-core            1.7.5-fargo.2-1                 amd64        openvStorage Backend plugin core
openvstorage-backend-webapps         1.7.5-fargo.2-1                 amd64        openvStorage Backend plugin Web Applications
openvstorage-core                    2.7.5-fargo.4-1                 amd64        openvStorage core
openvstorage-hc                      1.7.5-fargo.2-1                 amd64        openvStorage Backend plugin HyperConverged
openvstorage-health-check            3.1.2-fargo.8-1                 amd64        Open vStorage HealthCheck
openvstorage-sdm                     1.6.5-fargo.1-1                 amd64        Open vStorage Backend ASD Manager
openvstorage-webapps                 2.7.5-fargo.4-1                 amd64        openvStorage Web Applications
JeffreyDevloo commented 7 years ago

Information

Seems to have happened on OVH where certain nodes were removed from clusters with external arakoon. These arakoons were not wiped and are still being used by the other nodes. Since the nodes are stored in the arakoon but not present in reality, we should show the appropriate error.

JeffreyDevloo commented 7 years ago

Fixed by https://github.com/openvstorage/openvstorage-health-check/pull/148

kinvaris commented 7 years ago

FAILED: Not properly implemented

[FAILED] The following nodes are stored in arakoon but missing in reality: [('ovsdb', []), ('voldrv', []), ('flash_roub_abm', ['p5ol09r9cnHriw3N']), ('flash_roub_nsm_3', ['p5ol09r9cnHriw3N']), ('flash_roub_nsm_2', ['p5ol09r9cnHriw3N']), ('flash_roub_nsm_1', ['p5ol09r9cnHriw3N']), ('global-abm', [])]
JeffreyDevloo commented 7 years ago

The format could use some changing. It should not return the clusters without any missing nodes. Expected [('flash_roub_abm', ['p5ol09r9cnHriw3N']), ('flash_roub_nsm_3', ['p5ol09r9cnHriw3N']), ('flash_roub_nsm_2', ['p5ol09r9cnHriw3N']), ('flash_roub_nsm_1', ['p5ol09r9cnHriw3N'])]

JeffreyDevloo commented 7 years ago

Fixed by https://github.com/openvstorage/openvstorage-health-check/pull/156

kinvaris commented 7 years ago

PASSED: Executed on OVH environment