sap-oc / cookbook-openstack-network

Chef Cookbook - OpenStack Network
http://openstack.org
0 stars 0 forks source link

[placeholder] network issues on 22 Apr #8

Closed matelakat closed 7 years ago

matelakat commented 7 years ago

On 22 Apr (Saturday) a network outage happened. We suspect neutron-ha-tool moved some of the high-traffic routers to network node 1, which became unstable due to the high load. On 24 Apr some routers have been moved off network node 1, and that seemed to make the landscape stable.

Investigation

Takeaways

this information is clearly not representing that this is a timeout, and that this might not be an error after all

matelakat commented 7 years ago

After looking for some clues what made pacemaker think that rabbit is not running, we found no evidence. The action we took is we log the output of rabbitmq status check. See this issue: https://github.com/sap-oc/crowbar-openstack/issues/35

matelakat commented 7 years ago

The outage of rabbit followed from the pacemaker logs:

Apr 22 10:55:47 [4629] d00-25-b5-a0-00-b9       crmd:     info: process_lrm_event:      Operation rabbitmq_monitor_10000: not running (node=d00-25-b5-a0-00-b9
Apr 22 10:56:19 [4629] d00-25-b5-a0-00-b9       crmd:     info: process_lrm_event:      Operation rabbitmq_monitor_10000: ok (node=d00-25-b5-a0-00-b9, call=648, rc=0, cib-update=830, confirmed=false)

Which is 32 seconds

matelakat commented 7 years ago

Closing this down as spin off cards have been created.