openvstorage / openvstorage-health-check

The health check is classified as a monitoring and detection tool for Open vStorage.
3 stars 7 forks source link

if the volumedriver goes into timeout, the status needs to be unknown not error #163

Closed kinvaris closed 7 years ago

kinvaris commented 7 years ago

Problem description

A timeout of a volume creation needs to be in warning not in error https://github.com/openvstorage/openvstorage-health-check/blob/master/ovs/extensions/healthcheck/volumedriver/volumedriver_health_check.py#L150

Additional information

Setup

Packages

// List packages in the code tags using dpkg -l | grep openvstorage

root@ovs-node01-1604:~# dpkg -l | grep openvstorage
ii  blktap-openvstorage-utils            2.0.90-2ubuntu4                     amd64        utilities to work with VHD disk images files
ii  libblktapctl0-openvstorage           2.0.90-2ubuntu4                     amd64        Xen API blktapctl shared library (shared library)
ii  libvhd0-openvstorage                 2.0.90-2ubuntu4                     amd64        VHD file format access library
ii  libvhdio-2.0.90-openvstorage         2.0.90-2ubuntu4                     amd64        Xen API blktap shared library (shared library)
ii  openvstorage                         2.7.7-rev.4367.d7725ca-1            amd64        openvStorage
ii  openvstorage-backend                 1.7.7-rev.828.5e3746c-1             amd64        openvStorage Backend plugin
ii  openvstorage-backend-core            1.7.7-rev.828.5e3746c-1             amd64        openvStorage Backend plugin core
ii  openvstorage-backend-webapps         1.7.7-rev.828.5e3746c-1             amd64        openvStorage Backend plugin Web Applications
ii  openvstorage-core                    2.7.7-rev.4367.d7725ca-1            amd64        openvStorage core
ii  openvstorage-hc                      1.7.7-rev.828.5e3746c-1             amd64        openvStorage Backend plugin HyperConverged
ii  openvstorage-health-check            3.1.2-rev.271.a9de90e-1             amd64        Open vStorage HealthCheck
ii  openvstorage-sdm                     1.6.7-rev.455.7913d38-1             amd64        Open vStorage Backend ASD Manager
ii  openvstorage-webapps                 2.7.7-rev.4367.d7725ca-1            amd64        openvStorage Web Applications
wimpers commented 7 years ago

@redlicha what would your advice be?

BAM needs to reflect on wether we should do this.

khenderick commented 7 years ago

@wimpers, isn't this just something of the healthcheck itself? As like how they should log their error?

JeffreyDevloo commented 7 years ago

@wimpers There is no need to include Arne. This is a decision of the healthcheck to display timeouts as warning instead of error (as it could mean that the system is healthy but slow!)

wimpers commented 7 years ago

@khenderick so add status TIMEOUT as optional output status (instead of warning/critical)?

khenderick commented 7 years ago

@wimpers, I have decided that @JeffreyDevloo and @kinvaris shall change logger.failure to logger.warning.

wimpers commented 7 years ago

This is a decision of the healthcheck to display timeouts as warning instead of error

Actually and frankly it is not up to the team writing the healthcheck to make that decision. The healthcheck team should not on its own decide when OPS (and engineering if things really goes south) gets called out of bed in the middle of the night.

JeffreyDevloo commented 7 years ago

We always poll with OPS for the reason that you stated. Most of our changes are also requested by ops.

jtorreke commented 7 years ago

Maybe a suggestion: when creating a volume results in a time out, can we try again? Based on the 2nd attempt, I'd go for an error, as it will most likely point to something underneath which is too slow/not working/...

wimpers commented 7 years ago

After discussion with @jtorreke :

Since check_Mk knows ok, warning, error, unknown let's use unknown in case there is a timeout.

JeffreyDevloo commented 7 years ago

Volume creation is currently disabled.

pploegaert commented 7 years ago

Validation in: https://github.com/openvstorage/integrationtests/issues/447

JeffreyDevloo commented 7 years ago

Fixed by https://github.com/openvstorage/openvstorage-health-check/pull/291 -> openvstorage-health-check 3.2.1-rev.490.f2bbe59