sonic-net / sonic-sairedis

SAI object interface to Redis database, as used in the SONiC project
Other
56 stars 256 forks source link

Sonic-vs : Syncd not getting netlink messages and oper status not updated #1357

Open sudhiaithal opened 5 months ago

sudhiaithal commented 5 months ago

I am seeing an issue where syncd is not getting netlink message when link is added/deleted/up/down.

When syncd starts, it is getting all the messags as expected Feb 15 18:11:04.828518 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: lo, ifflags: 0x10049, ifindex: 1 Feb 15 18:11:04.828550 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received done RTM_NEWLINK ifname: lo, ifflags: 0x10049, ifindex: 1 Feb 15 18:11:04.828651 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: eth0, ifflags: 0x11043, ifindex: 1745 Feb 15 18:11:04.828664 d809e83f1ad0 NOTICE #syncd: :- asyncOnLinkMsg: received done RTM_NEWLINK ifname: eth0, ifflags: 0x11043, ifindex: 1745

However, after a while when I do ifconfig eth0 up/down, syncd does not get any message but other process such as portsyncd gets

Feb 15 20:03:01.516713 874a0e235413 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:0 oper:0 addr:02:42:ac:11:00:02 ifindex:4106 master:0 type:veth Feb 15 20:03:03.236337 874a0e235413 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:1 oper:0 addr:02:42:ac:11:00:02 ifindex:4106 master:0 type:veth Feb 15 20:03:03.236486 874a0e235413 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth0 admin:1 oper:1 addr:02:42:ac:11:00:02 ifindex:4106 master:0 type:veth Feb 15 20:03:03.247468 874a0e235413 NOTICE #fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: 172.17.0.0/16 0.0.0.0 eth0 Feb 15 20:03:05.082645 874a0e235413 NOTICE #fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fe80::/64 :: eth0 ....

This is preventing from updating correct oper status, VS image old branch 202106, It works correctly as shown by below

Feb 13 18:43:44.509381 de88276cddc7 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: eth5, ifflags: 0x11103, ifindex: 93 Feb 13 18:43:44.509409 de88276cddc7 NOTICE #syncd: :- asyncOnLinkMsg: received RTM_NEWLINK ifname: eth5, ifflags: 0x11143, ifindex: 93 Feb 13 18:43:44.509458 de88276cddc7 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth5 admin:1 oper:0 addr:7a:01:26:fd:50:d5 ifindex:93 master:0 type:veth Feb 13 18:43:44.509485 de88276cddc7 NOTICE #syncd: :- syncOnLinkMsg: newlink: ifindex: 93, ifflags: 0x11103, ifname: eth5 Feb 13 18:43:44.509535 de88276cddc7 NOTICE #syncd: :- send_port_oper_status_notification: send event SAI_SWITCH_ATTR_PORT_STATE_CHANGE_NOTIFY for port oid:0x100000005: SAI_PORT_OPER_STATUS_UP Feb 13 18:43:44.509627 de88276cddc7 NOTICE #syncd: :- syncOnLinkMsg: newlink: ifindex: 93, ifflags: 0x11143, ifname: eth5 Feb 13 18:43:44.509719 de88276cddc7 NOTICE #portsyncd: :- onMsg: nlmsg type:16 key:eth5 admin:1 oper:1 addr:7a:01:26:fd:50:d5 ifindex:93 master:0 type:veth

sudhiaithal commented 5 months ago

root@f4b1252e2cc5:/# grep "asyncOnLinkMsg" /var/log/syslog | wc -l 1050 root@f4b1252e2cc5:/#

However on old, just.1 for each interface

root@de88276cddc7:/# grep "asyncOnLinkMsg" /var/log/syslog | grep Ethernet | wc -l 103 root@de88276cddc7:/#

sudhiaithal commented 5 months ago

I was able to get around this problem by creating veth interface eth0-31 , that way all Ethernet* interface can map to a tap interface. After that this problem seems to go away

kcudnik commented 5 months ago

not sure if this is exact syncd issue, depends who is responsible to generate this netlink messages, syncd is listening to all those messages, but port up/down is not up to syncd, is this on real hardware or virtual switch ?

sudhiaithal commented 5 months ago

this is on virtual switch. I think flood of messages is causing some lock up on netlink socket of sycnd. So, if we just bring up VS without all veth interfaces up then I see this issue. Seems to work fine when all veth interfaces are created before VS bringup

kcudnik commented 5 months ago

Netlink is sy chronized in sync each message is processed in synchroonized block under mutex but it should receive all meswges, are you generating food on purpose ? Is any other procesu recdiving all generated messages ?