Closed mannytaheri closed 1 month ago
Hi @mannytaheri, Can you please patch these 2 PR and see if the port comes up faster https://github.com/sonic-net/sonic-host-services/pull/135 https://github.com/sonic-net/sonic-buildimage/pull/19482
@mannytaheri please let me know the test result with the patches, if not working I'll look at
PMON faster bring up does not seem to help this issue. @wenyiz2021 could you help follow up with Arvind/BRCM and try out the BRCM fix
Related issues raised earlier ? : https://github.com/sonic-net/sonic-buildimage/issues/17180
To further debug
Check with sonic-common-infra subgroup, for a root cause which could be known.
The FIB suppress pending feature got merged recently, can we check again with latest master.202405 build https://github.com/sonic-net/sonic-buildimage/pull/19736 @mannytaheri
cannot reproduce issue on Arista chassis with latest master image with SAI 11.2 taken from https://github.com/sonic-net/sonic-buildimage/pull/19854
@mannytaheri this seems not general issue for all platform? can you try above master image with SAI 11?
above is the understanding.
Ack, will simulate this case and check if the issue related with by sonic-swss-common selectable priority.
Here is update, today I create test case to simulate the case, here is my summary:
Orchagent support 2 kinds of consumer: SubscriberStateTable CONFIG_DB,STATE_DB,CHASSIS_APP_DB ConsumerStateTable All other database
Because I don't know in chassis the BGP route and Port state using which table class, I create swss-common test case to simulate the issue and test both table class.
Here is how my test case work: Step1: Create port consumer with priority 45 Step2: Create route consumer with priority 5 Step3: Set DEFAULT_POP_BATCH_SIZE to 128 Step3: Create 1 port event Step4: Create 10000 route event Step5: start pop port and route event, in the middle of handle route event, create new port event and check if the port event pop immediately
ConsumerStateTable Not found issue Every pop will pop 128 route, if there is new port event, it will pop first
SubscriberStateTable Found issue: The batch size parameter of SubscriberStateTable does not work:
SubscriberStateTable route_consumer(&consumer_db, routeTableName, DEFAULT_POP_BATCH_SIZE, 5);
The SubscriberStateTable will always pop all route data, which means orchagent will not handle new incoming port event until it finish process all route event.
I'm not sure if the performance issue caused by this, checking about the database name and table name of port event and route event.
@liuh-80 , In chassis, the following is the code path and it seems to be using ConsumerStateTable
const int routeorch_pri = 5;
vector
RouteOrch::RouteOrch(DBConnector db, vector
Orch::Orch(DBConnector *db, const vector
void Orch::addConsumer(DBConnector *db, string tableName, int pri) { if (db->getDbId() == CONFIG_DB || db->getDbId() == STATE_DB || db->getDbId() == CHASSIS_APP_DB) { addExecutor(new Consumer(new SubscriberStateTable(db, tableName, TableConsumable::DEFAULT_POP_BATCH_SIZE, pri), this, tableName)); } else { addExecutor(new Consumer(new ConsumerStateTable(db, tableName, gBatchSize, pri), this, tableName)); } }
@liuh-80 , In chassis, the following is the code path and it seems to be using ConsumerStateTable
const int routeorch_pri = 5; vector
route_tables = { { APP_ROUTE_TABLE_NAME, routeorch_pri }, { APP_LABEL_ROUTE_TABLE_NAME, routeorch_pri } }; gRouteOrch = new RouteOrch(m_applDb, route_tables, gSwitchOrch, gNeighOrch, gIntfsOrch, vrf_orch, gFgNhgOrch, gSrv6Orch); RouteOrch::RouteOrch(DBConnector db, vector
&tableNames, SwitchOrch switchOrch, NeighOrch neighOrch, IntfsOrch intfsOrch, VRFOrch vrfOrch, FgNhgOrch fgNhgOrch, Srv6Orch *srv6Orch) : gRouteBulker(sai_route_api, gMaxBulkSize), gLabelRouteBulker(sai_mpls_api, gMaxBulkSize), gNextHopGroupMemberBulker(sai_next_hop_group_api, gSwitchId, gMaxBulkSize), Orch(db, tableNames), { }Orch::Orch(DBConnector *db, const vector
&tableNames_with_pri) { for (const auto& it : tableNames_with_pri) { addConsumer(db, it.first, it.second); } } void Orch::addConsumer(DBConnector *db, string tableName, int pri) { if (db->getDbId() == CONFIG_DB || db->getDbId() == STATE_DB || db->getDbId() == CHASSIS_APP_DB) { addExecutor(new Consumer(new SubscriberStateTable(db, tableName, TableConsumable::DEFAULT_POP_BATCH_SIZE, pri), this, tableName)); } else { addExecutor(new Consumer(new ConsumerStateTable(db, tableName, gBatchSize, pri), this, tableName)); } }
@saksarav-nokia , thanks, the issue need more investigation, I will try reproduce first.
@liuh-80 , We can easily reproduce this in our setup. Let me know if you want us to collect any info or logs?
@liuh-80 , We can easily reproduce this in our setup. Let me know if you want us to collect any info or logs?
@saksarav-nokia , can you share me the reproduce steps, OS version and hardware SKU?
admin@ixre-egl-board211:~$ show version
SONiC Software Version: SONiC.HEAD.798897-202405-3192720893 SONiC OS Version: 12 Distribution: Debian 12.6 Kernel: 6.1.0-11-2-amd64 Build commit: 3192720893 Build date: Thu Aug 15 09:35:12 UTC 2024 Built by: gitlab-runner@wfrv-sonicbld05
Platform: x86_64-nokia_ixr7250e_36x400g-r0 HwSKU: Nokia-IXR7250E-36x400G ASIC: broadcom ASIC Count: 2
admin@ixre-egl-board211:~$ show version
SONiC Software Version: SONiC.HEAD.798897-202405-3192720893 SONiC OS Version: 12 Distribution: Debian 12.6 Kernel: 6.1.0-11-2-amd64 Build commit: 3192720893 Build date: Thu Aug 15 09:35:12 UTC 2024 Built by: gitlab-runner@wfrv-sonicbld05
Platform: x86_64-nokia_ixr7250e_36x400g-r0 HwSKU: Nokia-IXR7250E-36x400G ASIC: broadcom ASIC Count: 2
What's the commands I need to run to create BGP routes and port up event? also what's the signal of BGP up event blocked by BGP routes, do I need check syslog?
@liuh-80 , We have 36 ebgp neighbors and 6 ibgp neighbors with 34 routes from each ebgp neighbor. We just reboot this Line card to see the issue. ash: q: command not found admin@ixre-egl-board211:~$ show ip bgp summary -d all
IPv4 Unicast Summary: asic0: BGP router identifier 8.0.0.24, local AS number 65100 vrf-id 0 BGP table version 501903 asic1: BGP router identifier 8.0.0.26, local AS number 65100 vrf-id 0 BGP table version 1594975 RIB entries 205222, using 39402624 bytes of memory Peers 32, using 23743232 KiB of memory Peer groups 8, using 512 bytes of memory
Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
3.3.3.24 4 65100 16567 16535 0 0 0 04:32:04 442664 ASIC0 3.3.3.26 4 65100 8943 8947 0 0 0 04:32:04 442659 ASIC1 3.3.3.36 4 65100 5880 8946 0 0 0 04:32:03 3098 ixre-egl-board212-ASIC0 3.3.3.36 4 65100 6663 16515 0 0 0 04:35:35 3098 ixre-egl-board212-ASIC0 3.3.3.38 4 65100 6072 8946 0 0 0 04:32:03 3098 ixre-egl-board212-ASIC1 3.3.3.38 4 65100 6855 16515 0 0 0 04:35:35 3098 ixre-egl-board212-ASIC1 10.0.0.1 4 65200 5711 5908 0 0 0 04:31:58 34050 ARISTA01T3 10.0.0.5 4 65200 5709 5903 0 0 0 04:31:51 34050 ARISTA03T3 10.0.0.9 4 65200 5709 5903 0 0 0 04:31:51 34050 ARISTA05T3 10.0.0.13 4 65200 5707 5898 0 0 0 04:31:45 34050 ARISTA07T3 10.0.0.17 4 65200 5707 5898 0 0 0 04:31:45 34050 ARISTA09T3 10.0.0.21 4 65200 5712 5909 0 0 0 04:32:02 34050 ARISTA11T3 10.0.0.23 4 65200 5711 5908 0 0 0 04:32:00 34050 ARISTA12T3 10.0.0.25 4 65200 5713 5914 0 0 0 04:32:05 34050 ARISTA13T3 10.0.0.27 4 65200 5713 5914 0 0 0 04:32:05 34049 ARISTA14T3 10.0.0.29 4 65200 5712 5909 0 0 0 04:32:01 34050 ARISTA15T3 10.0.0.31 4 65200 5711 5908 0 0 0 04:32:00 34050 ARISTA16T3 10.0.0.33 4 65200 5713 5914 0 0 0 04:32:05 34050 ARISTA17T3 10.0.0.35 4 65200 5713 5914 0 0 0 04:32:05 34050 ARISTA18T3 10.0.0.37 4 65200 6200 6612 0 0 0 04:56:25 34050 ARISTA19T3 10.0.0.41 4 65200 6206 6618 0 0 0 04:56:44 34050 ARISTA21T3 10.0.0.45 4 65200 6200 6612 0 0 0 04:56:25 34050 ARISTA23T3 10.0.0.49 4 65200 6202 6613 0 0 0 04:56:30 34050 ARISTA25T3 10.0.0.53 4 65200 6200 6612 0 0 0 04:56:25 34050 ARISTA27T3 10.0.0.57 4 65200 6206 6618 0 0 0 04:56:43 34049 ARISTA29T3 10.0.0.59 4 65200 6267 6680 0 0 0 04:59:48 34050 ARISTA30T3 10.0.0.61 4 65200 6206 6618 0 0 0 04:56:43 34049 ARISTA31T3 10.0.0.63 4 65200 6204 6616 0 0 0 04:56:38 34049 ARISTA32T3 10.0.0.65 4 65200 6204 6616 0 0 0 04:56:39 34049 ARISTA33T3 10.0.0.67 4 65200 6204 6616 0 0 0 04:56:37 34049 ARISTA34T3 10.0.0.69 4 65200 6247 6659 0 0 0 04:58:46 34050 ARISTA35T3 10.0.0.71 4 65200 6283 6695 0 0 0 05:00:35 34049 ARISTA36T3
Total number of neighbors 32
dmin@ixre-egl-board211:~$ admin@ixre-egl-board211:~$ show interface status -d all Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
Ethernet0 72,73,74,75,76,77,78,79 400G 9100 N/A Ethernet1/1 PortChannel102 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet8 80,81,82,83,84,85,86,87 400G 9100 N/A Ethernet2/1 PortChannel102 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet16 88,89,90,91,92,93,94,95 400G 9100 N/A Ethernet3/1 PortChannel104 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet24 96,97,98,99,100,101,102,103 400G 9100 N/A Ethernet4/1 PortChannel104 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet32 104,105,106,107,108,109,110,111 400G 9100 N/A Ethernet5/1 PortChannel106 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet40 112,113,114,115,116,117,118,119 400G 9100 N/A Ethernet6/1 PortChannel106 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet48 120,121,122,123,124,125,126,127 400G 9100 N/A Ethernet7/1 PortChannel108 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet56 128,129,130,131,132,133,134,135 400G 9100 N/A Ethernet8/1 PortChannel108 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet64 136,137,138,139,140,141,142,143 400G 9100 N/A Ethernet9/1 PortChannel1010 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet72 64,65,66,67,68,69,70,71 400G 9100 N/A Ethernet10/1 PortChannel1010 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet80 56,57,58,59,60,61,62,63 400G 9100 N/A Ethernet11/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet88 48,49,50,51,52,53,54,55 400G 9100 N/A Ethernet12/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet96 40,41,42,43,44,45,46,47 400G 9100 N/A Ethernet13/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet104 32,33,34,35,36,37,38,39 400G 9100 N/A Ethernet14/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet112 24,25,26,27,28,29,30,31 400G 9100 N/A Ethernet15/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet120 16,17,18,19,20,21,22,23 400G 9100 N/A Ethernet16/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet128 8,9,10,11,12,13,14,15 400G 9100 N/A Ethernet17/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet136 0,1,2,3,4,5,6,7 400G 9100 N/A Ethernet18/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet144 72,73,74,75,76,77,78,79 400G 9100 N/A Ethernet19/1 PortChannel1028 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet152 80,81,82,83,84,85,86,87 400G 9100 N/A Ethernet20/1 PortChannel1028 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet160 88,89,90,91,92,93,94,95 400G 9100 N/A Ethernet21/1 PortChannel1030 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet168 96,97,98,99,100,101,102,103 400G 9100 N/A Ethernet22/1 PortChannel1030 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet176 104,105,106,107,108,109,110,111 400G 9100 N/A Ethernet23/1 PortChannel1032 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet184 112,113,114,115,116,117,118,119 400G 9100 N/A Ethernet24/1 PortChannel1032 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet192 120,121,122,123,124,125,126,127 400G 9100 N/A Ethernet25/1 PortChannel1034 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet200 128,129,130,131,132,133,134,135 400G 9100 N/A Ethernet26/1 PortChannel1034 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet208 136,137,138,139,140,141,142,143 400G 9100 N/A Ethernet27/1 PortChannel1036 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet216 64,65,66,67,68,69,70,71 400G 9100 N/A Ethernet28/1 PortChannel1036 up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet224 56,57,58,59,60,61,62,63 400G 9100 N/A Ethernet29/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet232 48,49,50,51,52,53,54,55 400G 9100 N/A Ethernet30/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet240 40,41,42,43,44,45,46,47 400G 9100 N/A Ethernet31/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet248 32,33,34,35,36,37,38,39 400G 9100 N/A Ethernet32/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet256 24,25,26,27,28,29,30,31 400G 9100 N/A Ethernet33/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet264 16,17,18,19,20,21,22,23 400G 9100 N/A Ethernet34/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet272 8,9,10,11,12,13,14,15 400G 9100 N/A Ethernet35/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet280 0,1,2,3,4,5,6,7 400G 9100 N/A Ethernet36/1 routed up up QSFP-DD Double Density 8X Pluggable Transceiver off
Ethernet-IB0 219 10G 9100 N/A Recirc0/0 routed up up N/A off Ethernet-IB1 219 10G 9100 N/A Recirc1/0 routed up up N/A off Ethernet-Rec0 220 10G 9100 N/A Recirc0/1 routed up up N/A off Ethernet-Rec1 220 10G 9100 N/A Recirc1/1 routed up up N/A off PortChannel102 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel104 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel106 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel108 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel1010 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel1028 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel1030 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel1032 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel1034 N/A 800G 9100 N/A N/A routed up up N/A N/A PortChannel1036 N/A 800G 9100 N/A N/A routed up up N/A N/A
admin@ixre-egl-board211:~$ show version SONiC Software Version: SONiC.HEAD.798897-202405-3192720893 SONiC OS Version: 12 Distribution: Debian 12.6 Kernel: 6.1.0-11-2-amd64 Build commit: 3192720893 Build date: Thu Aug 15 09:35:12 UTC 2024 Built by: gitlab-runner@wfrv-sonicbld05 Platform: x86_64-nokia_ixr7250e_36x400g-r0 HwSKU: Nokia-IXR7250E-36x400G ASIC: broadcom ASIC Count: 2
What's the commands I need to run to create BGP routes and port up event? also what's the signal of BGP up event blocked by BGP routes, do I need check syslog?
@liuh-80 , We have 36 front panel ports in the asic and connected to Arista VM. We have enabled bgp protocol in both our chassis and Arista VM which established bgp neighbor and the routes are injected from Arista vm.
@mlok-nokia @fountzou for viz
Update: I found something, however need reproduce to confirm.
Seems the issue caused by following code:
void Consumer::execute()
{
// ConsumerBase::execute_impl
size_t update_size = 0;
auto table = static_cast<swss::ConsumerTableBase *>(getSelectable());
do
{
std::deque<KeyOpFieldsValuesTuple> entries;
table->pops(entries);
update_size = addToSync(entries);
} while (update_size != 0);
drain();
}
Here is my theory:
when there are 10000+ routes incoming, the consumer will be selected in following code: ''' void OrchDaemon::start() { ......
while (true) { Selectable *s; int ret;
ret = m_select->select(&s, SELECT_TIMEOUT); <== route consumer been selected here
......
auto *c = (Executor *)s;
c->execute();
/* After each iteration, periodically check all m_toSync map to
* execute all the remaining tasks that need to be retried. */
/* TODO: Abstract Orch class to have a specific todo list */
for (Orch *o : m_orchList)
o->doTask();
'''
The the route consumer will start execute() method.
Inside the execute() method, there was a loop, the pops method will pop 128 entry, then the addToSync will return a none zero value, which will case the table pops again:
'''
auto table = static_cast<swss::ConsumerTableBase *>(getSelectable());
do
{
std::deque
Because there are 10000+ routes in the table, the code actually block here, and port notification will never selected untill all routes finish.
I modify test case to simulate this case, seems the while loop do cause the issue: '''
void ProducerStateTableSet(ProducerStateTable &table, string key)
{
vector
TEST(Priority, massive_route_block_portstatus) { std::string routeTableName = "route_table"; std::string portTableName = "port_table";
DBConnector producer_db("TEST_DB", 0, true);
DBConnector consumer_db("TEST_DB", 0, true);
ProducerStateTable route_producer(&producer_db, routeTableName);
ProducerStateTable port_producer(&producer_db, portTableName);
ConsumerStateTable route_consumer(&consumer_db, routeTableName, DEFAULT_POP_BATCH_SIZE, 5);
ConsumerStateTable port_consumer(&consumer_db, portTableName, DEFAULT_POP_BATCH_SIZE, 40);
Select selector;
Selectable *selected;
selector.addSelectable(&route_consumer);
selector.addSelectable(&port_consumer);
// create 1 route table event
ProducerStateTableSet(port_producer, "port_up_01");
int ROUTE_COUNT = 1000;
for (int route_idx = 0; route_idx < ROUTE_COUNT; route_idx++)
{
ProducerStateTableSet(route_producer, "bgp_route_" + to_string(route_idx));
}
// simulate
selector.select(&selected);
EXPECT_EQ(selected, &port_consumer);
{
std::deque<KeyOpFieldsValuesTuple> ports;
port_consumer.pops(ports);
EXPECT_EQ(ports.size(), 1);
while (!ports.empty())
{
KeyOpFieldsValuesTuple port = ports.front();
auto key = kfvKey(port);
cout << key << endl;
ports.pop_front();
}
}
int poped_entry = 0;
bool send_port_02 = false;
while (poped_entry < ROUTE_COUNT + 1)
{
selector.select(&selected);
cout << "seletcted" << endl;
if (selected == &route_consumer)
{
int routes_count = 0;
do
{
std::deque<KeyOpFieldsValuesTuple> routes;
route_consumer.pops(routes);
poped_entry += (int)(routes.size());
routes_count = (int)(routes.size());
cout << "poped " << routes.size() << " routes" << endl;
while (!routes.empty())
{
KeyOpFieldsValuesTuple route = routes.front();
auto key = kfvKey(route);
//cout << key << endl;
routes.pop_front();
}
if (!send_port_02 && poped_entry >= 500)
{
cout << "create new port status" << endl;
ProducerStateTableSet(port_producer, "port_up_02");
send_port_02 = true;
}
}
while (routes_count > 0);
}
else if(selected == &port_consumer)
{
std::deque<KeyOpFieldsValuesTuple> ports;
port_consumer.pops(ports);
cout << "poped " << ports.size() << " ports" << endl;
poped_entry += (int)(ports.size());
while (!ports.empty())
{
KeyOpFieldsValuesTuple port = ports.front();
auto key = kfvKey(port);
cout << key << endl;
ports.pop_front();
}
}
}
} '''
Test result: ''' [ RUN ] Priority.massive_route_block_portstatus port_up_01 seletcted poped 128 routes poped 128 routes poped 128 routes poped 128 routes create new port status poped 128 routes poped 128 routes poped 128 routes poped 104 routes poped 0 routes seletcted poped 1 ports port_up_02 [ OK ] Priority.massive_route_block_portstatus (139 ms) '''
@liuh-80 thanks so much for the investigation. just curious, in this theory, does number of bgp neighbors matters? asking because I was not able to reproduce this with 4 bgp neighbors, each adv 32k routes. the links are 400G
@liuh-80 thanks so much for the investigation. just curious, in this theory, does number of bgp neighbors matters? asking because I was not able to reproduce this with 4 bgp neighbors, each adv 32k routes. the links are 400G
I'm not understand how the BGP neighbors handled by orchagent, so not sure if the BGP neighbors related with this issue. We need test on hardware to confirm this is the root cause. I will prepare an image to verify the fix
Found the issue may cause by this change: https://github.com/sonic-net/sonic-swss/commit/92589789aa79bf1e70937a35cb06eff8a358ab6b#diff-96451cb89f907afccbd39ddadb6d30aa21fe6fbd01b1cbaf6362078b926f1f08
Create a draft fix to verify the change is root cause: https://github.com/sonic-net/sonic-swss/pull/3269
Being looked into actviely within MSFT.
Found the issue may cause by this change: sonic-net/sonic-swss@9258978#diff-96451cb89f907afccbd39ddadb6d30aa21fe6fbd01b1cbaf6362078b926f1f08
Create a draft fix to verify the change is root cause: sonic-net/sonic-swss#3269
@liuh-80 I built an image (latest master) with this change and tested. For the first time boot up after installation on a single linecard, all ports come up in 8 minutes and all 34k routes are also installed. For subsequent reboot a single linecard, it takes about 7 minutes for all linkup and 34k routes installed. It seems this change addresses the issue. We need to do more testing to verify that, includes the OC testing.
Since the change verified can fix the issue, I published it for review and get comments:
I am looking at the orchagent crashes seen with this fix.
Close because fix PR merged.
Issue Description
The ports take more the 20 minutes to come up due to the delayed port up notification processing by orchagent after reload/reboot in T2 topo.
Results you see
The port up notifications are queued due to lot of bgp route (34000 routes) updates and take a long time. This occurs after a config reload or a reboot.
Results you expected to see
The bgp routes update should be handled correctly and ports should come up in a reasonable time.
Is it platform specific
generic
Relevant log output
No response
Output of
show version
Attach files (if any)
No response