sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.44k forks source link

Local subnet route is being removed on link operational state down #6149

Open nazariig opened 3 years ago

nazariig commented 3 years ago

Description Local subnet routes for IPv4/IPv6 are being removed on link operational state down. This causes packets to be looped thru the default route if any.

Steps to reproduce the issue:

  1. Deploy PTF32/T0/T1-LAG
  2. Shutdown neighbor interface
  3. Startup neighbor interface

Describe the results you received:

Link operational state is up:

Kernel:

root@sonic:/home/admin# ip -4 route show dev Ethernet64
10.0.0.32/31 proto kernel scope link src 10.0.0.32
root@sonic:/home/admin# ip -6 route show dev Ethernet64
fc00::40/126 proto kernel metric 256  pref medium
fe80::/64 proto kernel metric 256  pref medium

root@sonic:/home/admin# redis-cli -n 0 KEYS '*' | grep 'ROUTE_TABLE:10.0.0.32/31'
ROUTE_TABLE:10.0.0.32/31
root@sonic:/home/admin# redis-cli -n 0 KEYS '*' | grep 'ROUTE_TABLE:fc00::40/126'
ROUTE_TABLE:fc00::40/126
root@sonic:/home/admin# redis-cli -n 0 KEYS '*' | grep 'ROUTE_TABLE:fe80::/64'
ROUTE_TABLE:fe80::/64

Debugger:

 92│         if (memcmp(vrf, VRF_PREFIX, strlen(VRF_PREFIX)))
 93│         {
 94│             SWSS_LOG_ERROR("Invalid VRF name %s (ifindex %u)", vrf, rtnl_route_get_table(route_obj));
 95│             return;
 96│         }
 97│         memcpy(destipprefix, vrf, strlen(vrf));
 98│         destipprefix[strlen(vrf)] = ':';
 99│     }
100│
101│     dip = rtnl_route_get_dst(route_obj);
102│     nl_addr2str(dip, destipprefix + strlen(destipprefix), MAX_ADDR_SIZE);
103├>    SWSS_LOG_DEBUG("Receive new route message dest ip prefix: %s", destipprefix);
104│
105│     /*
106│      * Upon arrival of a delete msg we could either push the change right away,
107│      * or we could opt to defer it if we are going through a warm-reboot cycle.
108│      */
109│     bool warmRestartInProgress = m_warmStartHelper.inProgress();
110│
111│     if (nlmsg_type == RTM_DELROUTE)
112│     {
113│         if (!warmRestartInProgress)
114│         {
/sonic-swss/fpmsyncd/routesync.cpp

Thread 1 "fpmsyncd" hit Breakpoint 1, swss::RouteSync::onRouteMsg (this=this@entry=0x7ffe81197300, nlmsg_type=nlmsg_type@entry=24, obj=obj@entry=0x561d45678260, vrf=vr
f@entry=0x0) at routesync.cpp:103
(gdb) p destipprefix
$1 = "10.0.0.32/31", '\000' <repeats 69 times>
(gdb) p *route_obj
$2 = {ce_refcnt = 1, ce_ops = 0x7f43fcabac20 <route_obj_ops>, ce_cache = 0x0, ce_list = {next = 0x561d45678278, prev = 0x561d45678278}, ce_msgtype = 24, ce_flags = 0,
ce_mask = 37119, rt_family = 2 '\002', rt_dst_len = 0 '\000', rt_src_len = 0 '\000', rt_tos = 0 '\000', rt_protocol = 2 '\002', rt_scope = 0 '\000', rt_type = 1 '\001'
, rt_nmetrics = 0 '\000', rt_ttl_propagate = 0 '\000', rt_flags = 0, rt_dst = 0x561d45694770, rt_src = 0x0, rt_table = 0, rt_iif = 0, rt_prio = 0, rt_metrics = {0 <rep
eats 17 times>}, rt_metrics_mask = 0, rt_nr_nh = 1, rt_pref_src = 0x0, rt_nexthops = {next = 0x561d45678398, prev = 0x561d45678398}, rt_cacheinfo = {rtci_clntref = 0,
rtci_last_use = 0, rtci_expires = 0, rtci_error = 0, rtci_used = 0, rtci_id = 0, rtci_ts = 0, rtci_tsage = 0}, rt_flag_mask = 0}
(gdb) c
Continuing.

Thread 1 "fpmsyncd" hit Breakpoint 1, swss::RouteSync::onRouteMsg (this=this@entry=0x7ffe81197300, nlmsg_type=nlmsg_type@entry=24, obj=obj@entry=0x561d45678260, vrf=vr
f@entry=0x0) at routesync.cpp:103
(gdb) p destipprefix
$3 = "fc00::40/126", '\000' <repeats 69 times>
(gdb) p *route_obj
$4 = {ce_refcnt = 1, ce_ops = 0x7f43fcabac20 <route_obj_ops>, ce_cache = 0x0, ce_list = {next = 0x561d45678278, prev = 0x561d45678278}, ce_msgtype = 24, ce_flags = 0,
ce_mask = 37119, rt_family = 10 '\n', rt_dst_len = 0 '\000', rt_src_len = 0 '\000', rt_tos = 0 '\000', rt_protocol = 2 '\002', rt_scope = 0 '\000', rt_type = 1 '\001',
 rt_nmetrics = 0 '\000', rt_ttl_propagate = 0 '\000', rt_flags = 0, rt_dst = 0x561d4569feb0, rt_src = 0x0, rt_table = 0, rt_iif = 0, rt_prio = 0, rt_metrics = {0 <repe
ats 17 times>}, rt_metrics_mask = 0, rt_nr_nh = 1, rt_pref_src = 0x0, rt_nexthops = {next = 0x561d456948a8, prev = 0x561d456948a8}, rt_cacheinfo = {rtci_clntref = 0, r
tci_last_use = 0, rtci_expires = 0, rtci_error = 0, rtci_used = 0, rtci_id = 0, rtci_ts = 0, rtci_tsage = 0}, rt_flag_mask = 0}
(gdb) c
Continuing.

Thread 1 "fpmsyncd" hit Breakpoint 1, swss::RouteSync::onRouteMsg (this=this@entry=0x7ffe81197300, nlmsg_type=nlmsg_type@entry=24, obj=obj@entry=0x561d45678260, vrf=vr
f@entry=0x0) at routesync.cpp:103
(gdb) p destipprefix
$5 = "fe80::/64", '\000' <repeats 72 times>
(gdb) p *route_obj
$6 = {ce_refcnt = 1, ce_ops = 0x7f43fcabac20 <route_obj_ops>, ce_cache = 0x0, ce_list = {next = 0x561d45678278, prev = 0x561d45678278}, ce_msgtype = 24, ce_flags = 0,
ce_mask = 37119, rt_family = 10 '\n', rt_dst_len = 0 '\000', rt_src_len = 0 '\000', rt_tos = 0 '\000', rt_protocol = 2 '\002', rt_scope = 0 '\000', rt_type = 1 '\001',
 rt_nmetrics = 0 '\000', rt_ttl_propagate = 0 '\000', rt_flags = 0, rt_dst = 0x561d4569feb0, rt_src = 0x0, rt_table = 0, rt_iif = 0, rt_prio = 0, rt_metrics = {0 <repe
ats 17 times>}, rt_metrics_mask = 0, rt_nr_nh = 1, rt_pref_src = 0x0, rt_nexthops = {next = 0x561d456948a8, prev = 0x561d456948a8}, rt_cacheinfo = {rtci_clntref = 0, r
tci_last_use = 0, rtci_expires = 0, rtci_error = 0, rtci_used = 0, rtci_id = 0, rtci_ts = 0, rtci_tsage = 0}, rt_flag_mask = 0}
(gdb) c
Continuing.

Logs:

Dec  7 23:08:34.378191 sonic WARNING kernel: [17574.284369] sx_netdev_handle_pude_event: Called for logical port - 10100 status UP
Dec  7 23:08:34.378904 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[2391]- set_dispatch_attrib_handler: Set PACKET_ACTION, key:route 0.0.0.0 0.0.0.0, val:DROP
Dec  7 23:08:34.379223 sonic NOTICE syncd#SDK: [SAI_SWITCH.NOTICE] mlnx_sai_switch.c[4356]- event_thread_func: Port 10100 changed state to up
Dec  7 23:08:34.380375 sonic INFO swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet64 admin:1 oper:1 addr:b8:59:9f:a6:28:00 ifindex:262 master:0 type:sx_netdev
Dec  7 23:08:34.380624 sonic NOTICE swss#orchagent: :- doTask: Get port state change notification id:100000000014a status:1
Dec  7 23:08:34.380624 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet64 oper state set from down to up
Dec  7 23:08:34.381301 sonic NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status UP to host interface Ethernet64
Dec  7 23:08:34.382347 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[2391]- set_dispatch_attrib_handler: Set OPER_STATUS, key:host interface 16, val:true
Dec  7 23:08:40.932733 sonic INFO swss#orchagent: :- addRoute: Create route 10.0.0.32/31 with next hop(s) 0.0.0.0@Ethernet64
Dec  7 23:08:40.932973 sonic INFO swss#orchagent: :- addRoute: Create route fc00::40/126 with next hop(s) ::@Ethernet64
Dec  7 23:08:40.932973 sonic INFO swss#orchagent: :- removeRoute: Failed to find route entry, vrf_id 0x3000000000010, prefix fe80::/64

Link operational state is down:

Kernel:

root@sonic:/home/admin# ip -4 route show dev Ethernet64
10.0.0.32/31 proto kernel scope link src 10.0.0.32 linkdown
root@sonic:/home/admin# ip -6 route show dev Ethernet64
fc00::40/126 proto kernel metric 256 linkdown  pref medium
fe80::/64 proto kernel metric 256 linkdown  pref medium

root@sonic:/home/admin# redis-cli -n 0 KEYS '*' | grep 'ROUTE_TABLE:10.0.0.32/31'
root@sonic:/home/admin# redis-cli -n 0 KEYS '*' | grep 'ROUTE_TABLE:fc00::40/126'
root@sonic:/home/admin# redis-cli -n 0 KEYS '*' | grep 'ROUTE_TABLE:fe80::/64'
ROUTE_TABLE:fe80::/64

Debugger:

 92│         if (memcmp(vrf, VRF_PREFIX, strlen(VRF_PREFIX)))
 93│         {
 94│             SWSS_LOG_ERROR("Invalid VRF name %s (ifindex %u)", vrf, rtnl_route_get_table(route_obj));
 95│             return;
 96│         }
 97│         memcpy(destipprefix, vrf, strlen(vrf));
 98│         destipprefix[strlen(vrf)] = ':';
 99│     }
100│
101│     dip = rtnl_route_get_dst(route_obj);
102│     nl_addr2str(dip, destipprefix + strlen(destipprefix), MAX_ADDR_SIZE);
103├>    SWSS_LOG_DEBUG("Receive new route message dest ip prefix: %s", destipprefix);
104│
105│     /*
106│      * Upon arrival of a delete msg we could either push the change right away,
107│      * or we could opt to defer it if we are going through a warm-reboot cycle.
108│      */
109│     bool warmRestartInProgress = m_warmStartHelper.inProgress();
110│
111│     if (nlmsg_type == RTM_DELROUTE)
112│     {
113│         if (!warmRestartInProgress)
114│         {
/sonic-swss/fpmsyncd/routesync.cpp

Thread 1 "fpmsyncd" hit Breakpoint 3, swss::RouteSync::onRouteMsg (this=this@entry=0x7fff6178fc30, nlmsg_type=nlmsg_type@entry=25, obj=obj@entry=0x55df0e770de0, vrf=vr
f@entry=0x0) at routesync.cpp:103
(gdb) p destipprefix
$10 = "10.0.0.32/31", '\000' <repeats 69 times>
(gdb) p *route_obj
$11 = {ce_refcnt = 1, ce_ops = 0x7f8a5b44ec20 <route_obj_ops>, ce_cache = 0x0, ce_list = {next = 0x55df0e770df8, prev = 0x55df0e770df8}, ce_msgtype = 25, ce_flags = 0,
 ce_mask = 4351, rt_family = 2 '\002', rt_dst_len = 0 '\000', rt_src_len = 0 '\000', rt_tos = 0 '\000', rt_protocol = 0 '\000', rt_scope = 0 '\000', rt_type = 0 '\000'
, rt_nmetrics = 0 '\000', rt_ttl_propagate = 0 '\000', rt_flags = 0, rt_dst = 0x55df0e770d90, rt_src = 0x0, rt_table = 0, rt_iif = 0, rt_prio = 0, rt_metrics = {0 <rep
eats 17 times>}, rt_metrics_mask = 0, rt_nr_nh = 0, rt_pref_src = 0x0, rt_nexthops = {next = 0x55df0e770e98, prev = 0x55df0e770e98}, rt_cacheinfo = {rtci_clntref = 0,
rtci_last_use = 0, rtci_expires = 0, rtci_error = 0, rtci_used = 0, rtci_id = 0, rtci_ts = 0, rtci_tsage = 0}, rt_flag_mask = 0}
(gdb) c
Continuing.

Thread 1 "fpmsyncd" hit Breakpoint 3, swss::RouteSync::onRouteMsg (this=this@entry=0x7fff6178fc30, nlmsg_type=nlmsg_type@entry=25, obj=obj@entry=0x55df0e770de0, vrf=vr
f@entry=0x0) at routesync.cpp:103
(gdb) p destipprefix
$12 = "fc00::40/126", '\000' <repeats 69 times>
(gdb) p *route_obj
$13 = {ce_refcnt = 1, ce_ops = 0x7f8a5b44ec20 <route_obj_ops>, ce_cache = 0x0, ce_list = {next = 0x55df0e770df8, prev = 0x55df0e770df8}, ce_msgtype = 25, ce_flags = 0,
 ce_mask = 4351, rt_family = 10 '\n', rt_dst_len = 0 '\000', rt_src_len = 0 '\000', rt_tos = 0 '\000', rt_protocol = 0 '\000', rt_scope = 0 '\000', rt_type = 0 '\000',
 rt_nmetrics = 0 '\000', rt_ttl_propagate = 0 '\000', rt_flags = 0, rt_dst = 0x55df0e770ee0, rt_src = 0x0, rt_table = 0, rt_iif = 0, rt_prio = 0, rt_metrics = {0 <repe
ats 17 times>}, rt_metrics_mask = 0, rt_nr_nh = 0, rt_pref_src = 0x0, rt_nexthops = {next = 0x55df0e770e98, prev = 0x55df0e770e98}, rt_cacheinfo = {rtci_clntref = 0, r
tci_last_use = 0, rtci_expires = 0, rtci_error = 0, rtci_used = 0, rtci_id = 0, rtci_ts = 0, rtci_tsage = 0}, rt_flag_mask = 0}
(gdb) c
Continuing.

Thread 1 "fpmsyncd" hit Breakpoint 3, swss::RouteSync::onRouteMsg (this=this@entry=0x7fff6178fc30, nlmsg_type=nlmsg_type@entry=24, obj=obj@entry=0x55df0e770de0, vrf=vr
f@entry=0x0) at routesync.cpp:103
(gdb) p destipprefix
$14 = "fe80::/64", '\000' <repeats 72 times>
(gdb) p *route_obj
$15 = {ce_refcnt = 1, ce_ops = 0x7f8a5b44ec20 <route_obj_ops>, ce_cache = 0x0, ce_list = {next = 0x55df0e770df8, prev = 0x55df0e770df8}, ce_msgtype = 24, ce_flags = 0,
 ce_mask = 37119, rt_family = 10 '\n', rt_dst_len = 0 '\000', rt_src_len = 0 '\000', rt_tos = 0 '\000', rt_protocol = 2 '\002', rt_scope = 0 '\000', rt_type = 1 '\001'
, rt_nmetrics = 0 '\000', rt_ttl_propagate = 0 '\000', rt_flags = 0, rt_dst = 0x55df0e770ee0, rt_src = 0x0, rt_table = 0, rt_iif = 0, rt_prio = 0, rt_metrics = {0 <rep
eats 17 times>}, rt_metrics_mask = 0, rt_nr_nh = 1, rt_pref_src = 0x0, rt_nexthops = {next = 0x55df0e769548, prev = 0x55df0e769548}, rt_cacheinfo = {rtci_clntref = 0,
rtci_last_use = 0, rtci_expires = 0, rtci_error = 0, rtci_used = 0, rtci_id = 0, rtci_ts = 0, rtci_tsage = 0}, rt_flag_mask = 0}
(gdb) c
Continuing.

Logs:

Dec  7 23:09:56.716217 sonic WARNING kernel: [17656.622831] sx_netdev_handle_pude_event: Called for logical port - 10100 status DOWN
Dec  7 23:09:56.716729 sonic NOTICE syncd#SDK: [SAI_SWITCH.NOTICE] mlnx_sai_switch.c[4356]- event_thread_func: Port 10100 changed state to down
Dec  7 23:09:56.717493 sonic NOTICE swss#orchagent: :- doTask: Get port state change notification id:100000000014a status:2
Dec  7 23:09:56.717583 sonic INFO swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet64 admin:1 oper:0 addr:b8:59:9f:a6:28:00 ifindex:262 master:0 type:sx_netdev
Dec  7 23:09:56.717680 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet64 oper state set from up to down
Dec  7 23:09:56.718540 sonic NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet64
Dec  7 23:09:56.719326 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[2391]- set_dispatch_attrib_handler: Set OPER_STATUS, key:host interface 16, val:false
Dec  7 23:10:02.162867 sonic INFO swss#orchagent: :- removeRoute: Remove route 10.0.0.32/31 with next hop(s) 0.0.0.0@Ethernet64
Dec  7 23:10:02.163089 sonic INFO swss#orchagent: :- removeRoute: Remove route fc00::40/126 with next hop(s) ::@Ethernet64
Dec  7 23:10:02.163217 sonic INFO swss#orchagent: :- removeRoute: Failed to find route entry, vrf_id 0x3000000000010, prefix fe80::/64
Dec  7 23:10:02.163994 sonic NOTICE syncd#SDK: [SAI_ROUTE.NOTICE] mlnx_sai_route.c[507]- mlnx_remove_route: Remove route route 10.0.0.32 255.255.255.254
Dec  7 23:10:02.166903 sonic NOTICE syncd#SDK: [SAI_ROUTE.NOTICE] mlnx_sai_route.c[507]- mlnx_remove_route: Remove route route fc00::40 ffff:ffff:ffff:ffff:ffff:ffff:ffff:fffc

Describe the results you expected: Local subnet routes should not be removed on link operational state down

Additional information you deem important (e.g. issue happens only occasionally): Switch configuration:

root@sonic:/home/admin# show interfaces status
  Interface            Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -------  ------  ------  -------  ---------------  ----------
  Ethernet0          0,1,2,3     100G   9100     rs     etp1  routed      up       up  QSFP28 or later         off
  Ethernet4          4,5,6,7     100G   9100     rs     etp2  routed      up       up   QSFP+ or later         off
  Ethernet8        8,9,10,11     100G   9100     rs     etp3  routed      up       up  QSFP28 or later         off
 Ethernet12      12,13,14,15     100G   9100     rs     etp4  routed      up       up  QSFP28 or later         off
 Ethernet16      16,17,18,19     100G   9100     rs     etp5  routed      up       up   QSFP+ or later         off
 Ethernet20      20,21,22,23     100G   9100     rs     etp6  routed      up       up   QSFP+ or later         off
 Ethernet24      24,25,26,27     100G   9100     rs     etp7  routed      up       up  QSFP28 or later         off
 Ethernet28      28,29,30,31     100G   9100     rs     etp8  routed      up       up   QSFP+ or later         off
 Ethernet32      32,33,34,35     100G   9100     rs     etp9  routed      up       up   QSFP+ or later         off
 Ethernet36      36,37,38,39     100G   9100     rs    etp10  routed      up       up   QSFP+ or later         off
 Ethernet40      40,41,42,43     100G   9100     rs    etp11  routed      up       up   QSFP+ or later         off
 Ethernet44      44,45,46,47     100G   9100     rs    etp12  routed      up       up   QSFP+ or later         off
 Ethernet48      48,49,50,51     100G   9100     rs    etp13  routed      up       up   QSFP+ or later         off
 Ethernet52      52,53,54,55     100G   9100     rs    etp14  routed      up       up   QSFP+ or later         off
 Ethernet56      56,57,58,59     100G   9100     rs    etp15  routed      up       up   QSFP+ or later         off
 Ethernet60      60,61,62,63     100G   9100     rs    etp16  routed      up       up   QSFP+ or later         off
 Ethernet64      64,65,66,67     100G   9100     rs    etp17  routed    down       up   QSFP+ or later         off
 Ethernet68      68,69,70,71     100G   9100     rs    etp18  routed      up       up   QSFP+ or later         off
 Ethernet72      72,73,74,75     100G   9100     rs    etp19  routed      up       up   QSFP+ or later         off
 Ethernet76      76,77,78,79     100G   9100     rs    etp20  routed      up       up   QSFP+ or later         off
 Ethernet80      80,81,82,83     100G   9100     rs    etp21  routed      up       up   QSFP+ or later         off
 Ethernet84      84,85,86,87     100G   9100     rs    etp22  routed      up       up   QSFP+ or later         off
 Ethernet88      88,89,90,91     100G   9100     rs    etp23  routed      up       up   QSFP+ or later         off
 Ethernet92      92,93,94,95     100G   9100     rs    etp24  routed      up       up   QSFP+ or later         off
 Ethernet96      96,97,98,99     100G   9100     rs    etp25  routed      up       up   QSFP+ or later         off
Ethernet100  100,101,102,103     100G   9100     rs    etp26  routed      up       up   QSFP+ or later         off
Ethernet104  104,105,106,107     100G   9100     rs    etp27  routed      up       up   QSFP+ or later         off
Ethernet108  108,109,110,111     100G   9100     rs    etp28  routed      up       up   QSFP+ or later         off
Ethernet112  112,113,114,115     100G   9100     rs    etp29  routed      up       up   QSFP+ or later         off
Ethernet116  116,117,118,119     100G   9100     rs    etp30  routed      up       up   QSFP+ or later         off
Ethernet120  120,121,122,123      50G   9100    N/A    etp31  routed      up       up  QSFP28 or later         off
Ethernet124  124,125,126,127      50G   9100    N/A    etp32  routed      up       up  QSFP28 or later         off

root@sonic:/home/admin# show ip interfaces
Interface    Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
-----------  --------  -------------------  ------------  --------------  -------------
Ethernet0              10.0.0.0/31          up/up         ARISTA01T2      10.0.0.1
Ethernet4              10.0.0.2/31          up/up         ARISTA02T2      10.0.0.3
Ethernet8              10.0.0.4/31          up/up         ARISTA03T2      10.0.0.5
Ethernet12             10.0.0.6/31          up/up         ARISTA04T2      10.0.0.7
Ethernet16             10.0.0.8/31          up/up         ARISTA05T2      10.0.0.9
Ethernet20             10.0.0.10/31         up/up         ARISTA06T2      10.0.0.11
Ethernet24             10.0.0.12/31         up/up         ARISTA07T2      10.0.0.13
Ethernet28             10.0.0.14/31         up/up         ARISTA08T2      10.0.0.15
Ethernet32             10.0.0.16/31         up/up         ARISTA09T2      10.0.0.17
Ethernet36             10.0.0.18/31         up/up         ARISTA10T2      10.0.0.19
Ethernet40             10.0.0.20/31         up/up         ARISTA11T2      10.0.0.21
Ethernet44             10.0.0.22/31         up/up         ARISTA12T2      10.0.0.23
Ethernet48             10.0.0.24/31         up/up         ARISTA13T2      10.0.0.25
Ethernet52             10.0.0.26/31         up/up         ARISTA14T2      10.0.0.27
Ethernet56             10.0.0.28/31         up/up         ARISTA15T2      10.0.0.29
Ethernet60             10.0.0.30/31         up/up         ARISTA16T2      10.0.0.31
Ethernet64             10.0.0.32/31         up/down       ARISTA01T0      10.0.0.33
Ethernet68             10.0.0.34/31         up/up         ARISTA02T0      10.0.0.35
Ethernet72             10.0.0.36/31         up/up         ARISTA03T0      10.0.0.37
Ethernet76             10.0.0.38/31         up/up         ARISTA04T0      10.0.0.39
Ethernet80             10.0.0.40/31         up/up         ARISTA05T0      10.0.0.41
Ethernet84             10.0.0.42/31         up/up         ARISTA06T0      10.0.0.43
Ethernet88             10.0.0.44/31         up/up         ARISTA07T0      10.0.0.45
Ethernet92             10.0.0.46/31         up/up         ARISTA08T0      10.0.0.47
Ethernet96             10.0.0.48/31         up/up         ARISTA09T0      10.0.0.49
Ethernet100            10.0.0.50/31         up/up         ARISTA10T0      10.0.0.51
Ethernet104            10.0.0.52/31         up/up         ARISTA11T0      10.0.0.53
Ethernet108            10.0.0.54/31         up/up         ARISTA12T0      10.0.0.55
Ethernet112            10.0.0.56/31         up/up         ARISTA13T0      10.0.0.57
Ethernet116            10.0.0.58/31         up/up         ARISTA14T0      10.0.0.59
Ethernet120            10.0.0.60/31         up/up         ARISTA15T0      10.0.0.61
Ethernet124            10.0.0.62/31         up/up         ARISTA16T0      10.0.0.63
Loopback0              10.1.0.32/32         up/up         N/A             N/A
docker0                240.127.1.1/24       up/down       N/A             N/A
eth0                   10.210.25.3/22       up/up         N/A             N/A
lo                     127.0.0.1/8          up/up         N/A             N/A

root@sonic:/home/admin# show ipv6 interfaces
Interface    Master    IPv6 address/mask                         Admin/Oper    BGP Neighbor    Neighbor IP
-----------  --------  ----------------------------------------  ------------  --------------  -------------
Bridge                 fe80::98f6:97ff:fe5e:b713%Bridge/64       up/down       N/A             N/A
Ethernet0              fc00::1/126                               up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet0/64
Ethernet4              fc00::5/126                               up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet4/64
Ethernet8              fc00::9/126                               up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet8/64
Ethernet12             fc00::d/126                               up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet12/64
Ethernet16             fc00::11/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet16/64
Ethernet20             fc00::15/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet20/64
Ethernet24             fc00::19/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet24/64
Ethernet28             fc00::1d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet28/64
Ethernet32             fc00::21/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet32/64
Ethernet36             fc00::25/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet36/64
Ethernet40             fc00::29/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet40/64
Ethernet44             fc00::2d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet44/64
Ethernet48             fc00::31/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet48/64
Ethernet52             fc00::35/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet52/64
Ethernet56             fc00::39/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet56/64
Ethernet60             fc00::3d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet60/64
Ethernet64             fc00::41/126                              up/down       N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet64/64
Ethernet68             fc00::45/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet68/64
Ethernet72             fc00::49/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet72/64
Ethernet76             fc00::4d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet76/64
Ethernet80             fc00::51/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet80/64
Ethernet84             fc00::55/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet84/64
Ethernet88             fc00::59/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet88/64
Ethernet92             fc00::5d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet92/64
Ethernet96             fc00::61/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet96/64
Ethernet100            fc00::65/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet100/64
Ethernet104            fc00::69/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet104/64
Ethernet108            fc00::6d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet108/64
Ethernet112            fc00::71/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet112/64
Ethernet116            fc00::75/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet116/64
Ethernet120            fc00::79/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet120/64
Ethernet124            fc00::7d/126                              up/up         N/A             N/A
                       fe80::ba59:9fff:fea6:2800%Ethernet124/64
Loopback0              fc00:1::32/128                            up/up         N/A             N/A
                       fe80::f8d1:56ff:fee5:5822%Loopback0/64
docker0                fd00::1/80                                up/down       N/A             N/A
                       fe80::1%docker0/64
eth0                   fe80::9a03:9bff:fe98:cf46%eth0/64         up/up         N/A             N/A
lo                     ::1/128                                   up/up         N/A             N/A

Netlink reference:

Output of show version:

root@sonic:/home/admin# show version

SONiC Software Version: SONiC.201911.255-fd05c258
Distribution: Debian 9.13
Kernel: 4.9.0-11-2-amd64
Build commit: fd05c258
Build date: Sun Dec  6 05:08:15 UTC 2020
Built by: johnar@jenkins-worker-4

Platform: x86_64-mlnx_msn3700c-r0
HwSKU: ACS-MSN3700C
ASIC: mellanox
Serial Number: MT1852X03894
Uptime: 23:13:31 up  4:57,  3 users,  load average: 0.84, 1.47, 1.51

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-mlnx             201911.255-fd05c258   0f2a3267765c        398MB
docker-syncd-mlnx             latest                0f2a3267765c        398MB
docker-router-advertiser      201911.255-fd05c258   5f1d93572e3f        290MB
docker-router-advertiser      latest                5f1d93572e3f        290MB
docker-platform-monitor       201911.255-fd05c258   a0297760787a        665MB
docker-platform-monitor       latest                a0297760787a        665MB
docker-fpm-frr                201911.255-fd05c258   cda17ae636a2        335MB
docker-fpm-frr                latest                cda17ae636a2        335MB
docker-lldp-sv2               201911.255-fd05c258   21e6e1d68f8e        312MB
docker-lldp-sv2               latest                21e6e1d68f8e        312MB
docker-dhcp-relay             201911.255-fd05c258   84193dd75fd8        300MB
docker-dhcp-relay             latest                84193dd75fd8        300MB
docker-database               201911.255-fd05c258   7a3d9eca7d60        290MB
docker-database               latest                7a3d9eca7d60        290MB
docker-teamd                  201911.255-fd05c258   3b636d48ac2a        315MB
docker-teamd                  latest                3b636d48ac2a        315MB
docker-sonic-mgmt-framework   201911.255-fd05c258   909192797dca        428MB
docker-sonic-mgmt-framework   latest                909192797dca        428MB
docker-snmp-sv2               201911.255-fd05c258   c019e351b0a2        348MB
docker-snmp-sv2               latest                c019e351b0a2        348MB
docker-orchagent              201911.255-fd05c258   78ca40f6ff6e        333MB
docker-orchagent              latest                78ca40f6ff6e        333MB
docker-sflow                  201911.255-fd05c258   c74daf6231e3        315MB
docker-sflow                  latest                c74daf6231e3        315MB
docker-nat                    201911.255-fd05c258   020c56b1229d        316MB
docker-nat                    latest                020c56b1229d        316MB
docker-sonic-telemetry        201911.255-fd05c258   37269fabc27e        353MB
docker-sonic-telemetry        latest                37269fabc27e        353MB

Attach debug file sudo generate_dump:

(paste your output here)
nazariig commented 3 years ago

@prsunny please have a look

prsunny commented 3 years ago

From my analysis and the logs, this subnet route delete is triggered by FRR zebra. Verified on both 201911 and 201811 and below are the observations:

  1. [201911 - FRR] fpmsyncd is getting the Route msgs from zebra even-though kernel is not generating those.
  2. [201811 - Quagga] fpmsyncd is not getting the route msgs from zebra during oper status change. Also subnet handling is different b/w 201811 and 201911 but thats aside.
Dec 12 00:27:17.557194 str-sn3800-01 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet72 oper state set from up to down
Dec 12 00:27:17.557410 str-sn3800-01 NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet72
Dec 12 00:27:17.561697 str-sn3800-01 DEBUG bgp#fpmsyncd: :- onRouteMsg: Receive new route message dest ip prefix: fe80::/64
Dec 12 00:27:17.561697 str-sn3800-01 DEBUG bgp#fpmsyncd: :- onRouteMsg: RouteTable set msg: fe80::/64 :: eth0
Dec 12 00:27:17.561697 str-sn3800-01 DEBUG bgp#fpmsyncd: :- onRouteMsg: Receive new route message dest ip prefix: 72.1.1.0/24
nazariig commented 3 years ago
2\. [201811 - Quagga] fpmsyncd is _not_ getting the route msgs from zebra during oper status change.

@prsunny so what's the plan? Do we need a bug for FRR?

nazariig commented 3 years ago

Potential fix: https://github.com/FRRouting/frr/pull/7745