Closed greenpau closed 5 years ago
@blp , after a restart, the DB on a non-master did not come up and I see "_syntax error: Parsing raft header failed: Type mismatch for member 'prevservers'":
# systemctl status ovn-northd
● ovn-northd.service - OVN northd management daemon
Loaded: loaded (/usr/lib/systemd/system/ovn-northd.service; enabled; vendor preset: disabled)
Active: active (exited) since Tue 2018-09-04 10:21:41 EDT; 2min 16s ago
Process: 27954 ExecStop=/usr/share/openvswitch/scripts/ovn-ctl stop_northd (code=exited, status=0/SUCCESS)
Process: 28005 ExecStart=/usr/share/openvswitch/scripts/ovn-ctl start_northd $OVN_NORTHD_OPTS (code=exited, status=0/SUCCESS)
Main PID: 28005 (code=exited, status=0/SUCCESS)
Sep 04 10:20:41 ovn01 ovn-ctl[28005]: ovn-nbctl: unix:/run/openvswitch/ovnnb_db.sock: database connection failed (Connection refused)
Sep 04 10:21:11 ovn01 ovn-ctl[28005]: Waiting for OVN_Northbound to come up 2018-09-04T14:21:11Z|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
Sep 04 10:21:11 ovn01 ovn-ctl[28005]: /usr/share/openvswitch/scripts/ovs-lib: line 600: 28053 Alarm clock "$@"
Sep 04 10:21:11 ovn01 ovn-ctl[28005]: [FAILED]
Sep 04 10:21:11 ovn01 ovsdb-server[28079]: ovs|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server-sb.log
Sep 04 10:21:11 ovn01 ovn-ctl[28005]: ovsdb-server: syntax "{"cluster_id":"0f4c5869-9bad-4b42-a603-8643832bb33d","local_address":"tcp:0.0.0.0:6644","name...ited","min ... 266"}": syntax error: Parsing raft header failed: Type mismatch for member 'prev_servers'.
Sep 04 10:21:11 ovn01 ovn-nbctl[28082]: ovs|00001|nbctl|INFO|Called as ovn-nbctl init
Sep 04 10:21:11 ovn01 ovn-nbctl[28082]: ovs|00002|db_ctl_base|ERR|unix:/run/openvswitch/ovnnb_db.sock: database connection failed (Connection refused)
Sep 04 10:21:11 ovn01 ovn-ctl[28005]: ovn-nbctl: unix:/run/openvswitch/ovnnb_db.sock: database connection failed (Connection refused)
Sep 04 10:21:41 ovn01 systemd[1]: Started OVN northd management daemon.
Hint: Some lines were ellipsized, use -l to show in full.
@blp , raft related entry on non-leader is:
{
"cluster_id": "0f4c5869-9bad-4b42-a603-8643832bb33d",
"local_address": "tcp:0.0.0.0:6644",
"name": "OVN_Southbound",
"prev_eid": "7c510ec5-84be-4364-a1c1-929c12952537",
"prev_index": 93,
"prev_servers": null,
"prev_term": 1,
"server_id": "0b0fa006-de83-4768-98c2-2d53d9ae8266"
}
I think there is an expectation for prev_servers
to NOT be null
.
I've spent some time trying to chase this down and I just can't come across any circumstances where "prev_servers" should end up as null. I see cases where it could be omitted, I see cases where it could be a {}-surrounded object, I can't understand null.
Do you have a reproduction case?
@blp , not at the moment. If I see this again, I will post it here and reopen.
@blp , seeing these logs ~ every 25 seconds:
What is the process of troubleshooting this?
Last entry in
/var/log/openvswitch/ovsdb-server-nb.log
and/var/log/openvswitch/ovsdb-server-sb.log
: