1 node in a cluster gets hard stopped using virsh destroy
node gets rebooted
celery tasks running on this node do not appear to be able to store the task result in the result backend (memcache) causing tests to fail that explicitly wait for a test result == SUCCESS
Observations:
ovs healthcheck does show some issues with rabbitmq / workers on some nodes
after x hours without intervention this appears to repair itself?
Create test to reproduce this issue and investigate if it can be resolved by
updating restart sequence during boot
verify if the issue also occurs during a normal clean reboot
determine what exactly needs to be restarted to resolve the issue, e.g. only workers on all other nodes?
determine/verify status of rabbitmq/memcache/workers before/during the test
Cover area
framework
Issue type
reliability
Special conditions
None
Setup
all virtual setups of nightly build environments
issue can be seen in test results of testrail and observed when running a test on the node itself
Test description
Scenario:
Observations:
Create test to reproduce this issue and investigate if it can be resolved by
Cover area
Issue type
Special conditions
None
Setup
all virtual setups of nightly build environments
issue can be seen in test results of testrail and observed when running a test on the node itself
Type of setup: -- Hyperconverged