openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

OVN Cluster Health #154

Closed greenpau closed 6 years ago

greenpau commented 6 years ago

Seeing these messages in a clustered environment. After reading here:

2018-07-13T07:07:33.102Z|05969|raft|WARN|ignoring vote request received after only 144 ms (minimum election time is 1024 ms)
2018-07-13T07:07:45.337Z|05970|raft|WARN|Dropped 7 log messages in last 11 seconds (most recently, 2 seconds ago) due to excessive rate
2018-07-13T07:07:45.337Z|05971|raft|WARN|ignoring vote request received after only 81 ms (minimum election time is 1024 ms)
2018-07-13T07:07:57.870Z|05972|raft|WARN|Dropped 7 log messages in last 10 seconds (most recently, 2 seconds ago) due to excessive rate
blp commented 6 years ago

These messages tend to indicate that at least one member of the cluster is either having a bit of trouble communicating with the other members or that it is not promptly getting CPU time. Is the cluster very busy by any chance?

Currently "cluster/status" via ovs-appctl is the best way to assess the current health of a cluster from an individual server's point of view.

greenpau commented 6 years ago

These messages tend to indicate that at least one member of the cluster is either having a bit of trouble communicating with the other members or that it is not promptly getting CPU time. Is the cluster very busy by any chance?

The cluster isn't busy at all. 3 nodes. Dedicated hardware.

Currently "cluster/status" via ovs-appctl is the best way to assess the current health of a cluster from an individual server's point of view.

@blp, thank you!

greenpau commented 6 years ago

@blp , what is the version when cluster/status becomes available? It is not available in 2.10.90. 🤔

# ovs-appctl version
ovs-vswitchd (Open vSwitch) 2.10.90
# ovs-appctl list-commands | grep cluster
# 
greenpau commented 6 years ago

@blp , regarding CPU time, got me 🤔 . By default, I am running ovn-northd process with openvswitch user. I saw the following error during startup:

ovn-ctl[32125]: Starting ovn-northd nice: cannot set niceness: Permission denied

Perhaps, that is the reason the lack of CPU time?

greenpau commented 6 years ago

As a test, adding Nice instruction to systemd:

[Service]
Type=oneshot
RemainAfterExit=yes
Environment=OVS_RUNDIR=%t/openvswitch OVS_DBDIR=/var/lib/openvswitch
EnvironmentFile=-/etc/sysconfig/ovn-northd
ExecStart=/usr/share/openvswitch/scripts/ovn-ctl start_northd $OVN_NORTHD_OPTS
ExecStop=/usr/share/openvswitch/scripts/ovn-ctl stop_northd
User=openvswitch
Group=openvswitch
Nice=-10
greenpau commented 6 years ago

The cluster/status command is not being listed under ovs-appctl list-commands.

Here is how one can invoke it:

ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl cluster/status OVN_Southbound

The reason I didn't see the cluster/status in list-commands is because by default ovs-appctl connects to Open_vSwitch db. Once I specify the correct control socket, it works:

$ ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl list-commands
The available commands are:
  cluster/cid             DB
  cluster/kick            DB SERVER
  cluster/leave           DB
  cluster/sid             DB
  cluster/status          DB
  coverage/show
  exit
  list-commands
  memory/show
  ovsdb-server/add-db     DB
  ovsdb-server/add-remote REMOTE
  ovsdb-server/compact
  ovsdb-server/connect-active-ovsdb-server
  ovsdb-server/disable-monitor-cond
  ovsdb-server/disconnect-active-ovsdb-server
  ovsdb-server/get-active-ovsdb-server
  ovsdb-server/get-sync-exclude-tables
  ovsdb-server/list-dbs
  ovsdb-server/list-remotes
  ovsdb-server/perf-counters-clear
  ovsdb-server/perf-counters-show
  ovsdb-server/reconnect
  ovsdb-server/remove-db  DB
  ovsdb-server/remove-remote REMOTE
  ovsdb-server/set-active-ovsdb-server
  ovsdb-server/set-sync-exclude-tables
  ovsdb-server/sync-status
  version
  vlog/close
  vlog/disable-rate-limit [module]...
  vlog/enable-rate-limit  [module]...
  vlog/list
  vlog/list-pattern
  vlog/reopen
  vlog/set                {spec | PATTERN:destination:pattern}