Create autotest to reproduce celery task timeouts after node reboot - Githubissues

openvstorage / integrationtests

Open vStorage automated integration tests.

Other

0 stars 1 forks source link

Create autotest to reproduce celery task timeouts after node reboot #540

Open pploegaert opened 7 years ago

pploegaert commented 7 years ago

Test description

Scenario:

1 node in a cluster gets hard stopped using virsh destroy
node gets rebooted
celery tasks running on this node do not appear to be able to store the task result in the result backend (memcache) causing tests to fail that explicitly wait for a test result == SUCCESS

Observations:

ovs healthcheck does show some issues with rabbitmq / workers on some nodes
after x hours without intervention this appears to repair itself?

Create test to reproduce this issue and investigate if it can be resolved by

updating restart sequence during boot
verify if the issue also occurs during a normal clean reboot
determine what exactly needs to be restarted to resolve the issue, e.g. only workers on all other nodes?
determine/verify status of rabbitmq/memcache/workers before/during the test

Cover area

framework

Issue type

reliability

Special conditions

None

Setup

all virtual setups of nightly build environments
issue can be seen in test results of testrail and observed when running a test on the node itself
Type of setup: -- Hyperconverged

JeffreyDevloo commented 7 years ago

The issue persists even after a few hours. A restart of the ovs-workers.service will resolve this issue.

This is seen after node shutdown tests while tasks are still on the fly.