Closed prgeor closed 2 years ago
@qiluo-msft, @hui-ma please check this issue
Is this a regression?
ACT, will start investigate this issue.
Can't reproduce this issue with latest master branch image because pmon docker failed with following error:
Jul 25 05:01:20.160934 vlab-01 ERR pmon#chassis_db_init: Failed to load chassis due to ModuleNotFoundError("No module named 'sonic_platform'")
Xcvrd can't start on latest master branch vs image:
root@vlab-01:/# xcvrd Failed to load sfputil: Failed to load platform module 'sfputil': [Errno 2] No such file or directory: '/usr/share/sonic/platform/plugins/sfputil.py'
The issue can be reproduce with following code on multi-asic KVM:
from sonic_py_common import daemon_base from sonic_py_common import multi_asic from sonic_py_common.interface import backplane_prefix, inband_prefix, recirc_prefix from swsscommon import swsscommon
SELECT_TIMEOUT_MSECS = 1000
swsscommon.SonicDBConfig.initializeGlobalConfig() db = daemon_base.db_connect('APPL_DB', namespace='asic0') port_tbl = swsscommon.SubscriberStateTable(db, swsscommon.APP_PORT_TABLE_NAME) sel = swsscommon.Select() sel.addSelectable(port_tbl)
while True: (state, _) = sel.select(SELECT_TIMEOUT_MSECS) if state == swsscommon.Select.TIMEOUT: continue if state != swsscommon.Select.OBJECT: continue while True: (key, op, fvp) = port_tbl.pop() if not key: break print("dev event: key-{} op={} fvp={}".format(key, op, fvp))
if not key:
break
if not validate_port(key):
continue
fvp = dict(fvp) if fvp is not None else {}
if 'index' not in fvp:
fvp['index'] = '-1'
port_index = int(fvp['index'])
port_change_event = None
if op == swsscommon.SET_COMMAND:
print("set event: {}".format(fvp))
elif op == swsscommon.DEL_COMMAND:
print("dev event: {}".format(fvp))
And change config_db with: sonic-db-cli -n asic0 CONFIG_DB HSET "PORT|Ethernet0" "key" "field"
There will be multiple events:
set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'} set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'}
However, when change APPL_DB with following code: admin@vlab-08:~$ sonic-db-cli -n asic0 APPL_DB HSET "PORT_TABLE:Ethernet0" "key" "field2"
There will only 1 event:
dev event: Ethernet0 SET (('admin_status', 'up'), ('alias', 'Ethernet1/1'), ('asic_port_name', 'Eth0-ASIC0'), ('description', 'ARISTA01T2:Ethernet1'), ('index', '0'), ('lanes', '1,2,3,4'), ('mtu', '9100'), ('pfc_asym', 'off'), ('role', 'Ext'), ('speed', '40000'), ('tpid', '0x8100'), ('oper_status', 'up'), ('key', 'field')) dev event: Ethernet0 SET (('admin_status', 'up'), ('alias', 'Ethernet1/1'), ('asic_port_name', 'Eth0-ASIC0'), ('description', 'ARISTA01T2:Ethernet1'), ('index', '0'), ('lanes', '1,2,3,4'), ('mtu', '9100'), ('pfc_asym', 'off'), ('role', 'Ext'), ('speed', '40000'), ('tpid', '0x8100'), ('oper_status', 'up'), ('key', 'field2')) <-- only receive 1 event.
So because the demo code only listen to APPL_DB, but the command are modify CONFIG_DB, so seems other application modify APPL_DB multiple times and cause this issue.
swss-common works OK because only receive 1 event for APPL_DB change.
Will continue check where is the code modify CONFIG_DB multiple times.
Seems it's done by https://github.com/Azure/sonic-swss/blob/7126857c1e387784bd04b9f7502018b2a4973d85/cfgmgr/portmgr.cpp
This issue seems a design issue:
port manager will monitor config_db port table here: https://github.com/Azure/sonic-swss/blob/92589789aa79bf1e70937a35cb06eff8a358ab6b/cfgmgr/portmgrd.cpp#L55
when config_db port table changed, portmgrd will write same entry multiple times: https://github.com/Azure/sonic-swss/blob/7126857c1e387784bd04b9f7502018b2a4973d85/cfgmgr/portmgr.cpp#L127
for (auto &entry : field_values)
{
**writeConfigToAppDb(alias, fvField(entry), fvValue(entry));**
SWSS_LOG_NOTICE("Configure %s %s to %s", alias.c_str(), fvField(entry).c_str(), fvValue(entry).c_str());
}
if (!portOk)
{
SWSS_LOG_INFO("Port %s is not ready, pending...", alias.c_str());
**writeConfigToAppDb(alias, "mtu", mtu);
writeConfigToAppDb(alias, "admin_status", admin_status);**
field_values.clear();
field_values.emplace_back("mtu", mtu);
field_values.emplace_back("admin_status", admin_status);
it->second = KeyOpFieldsValuesTuple{alias, SET_COMMAND, field_values};
it++;
continue;
}
In my case, the entry contains 11 fields, so the code write same entry 11 times, then xvcrd receive 11 events: set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'}
Also here are syslog show portmgrd update same entry multiples:
for (auto &entry : field_values)
{
writeConfigToAppDb(alias, fvField(entry), fvValue(entry));
SWSS_LOG_NOTICE("Configure %s %s to %s", alias.c_str(), fvField(entry).c_str(), fvValue(entry).c_str()); <== here is the log code.
}
Jul 26 08:40:22.122569 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 alias to Ethernet1/4 Jul 26 08:40:22.128114 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 asic_port_name to Eth3-ASIC0 Jul 26 08:40:22.135013 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 description to Ethernet1/4 Jul 26 08:40:22.135013 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 index to 3 Jul 26 08:40:22.135013 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 lanes to 13,14,15,16 Jul 26 08:40:22.135724 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 pfc_asym to off Jul 26 08:40:22.135724 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 role to Ext Jul 26 08:40:22.135724 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 speed to 40000 Jul 26 08:40:22.135860 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 tpid to 0x8100
@liuh-80 let me verify and get back with the sonic version
Seems it's done by https://github.com/Azure/sonic-swss/blob/7126857c1e387784bd04b9f7502018b2a4973d85/cfgmgr/portmgr.cpp
This issue seems a design issue:
- port manager will monitor config_db port table here: https://github.com/Azure/sonic-swss/blob/92589789aa79bf1e70937a35cb06eff8a358ab6b/cfgmgr/portmgrd.cpp#L55
when config_db port table changed, portmgrd will write same entry multiple times: https://github.com/Azure/sonic-swss/blob/7126857c1e387784bd04b9f7502018b2a4973d85/cfgmgr/portmgr.cpp#L127
for (auto &entry : field_values) { **writeConfigToAppDb(alias, fvField(entry), fvValue(entry));** SWSS_LOG_NOTICE("Configure %s %s to %s", alias.c_str(), fvField(entry).c_str(), fvValue(entry).c_str()); } if (!portOk) { SWSS_LOG_INFO("Port %s is not ready, pending...", alias.c_str()); **writeConfigToAppDb(alias, "mtu", mtu); writeConfigToAppDb(alias, "admin_status", admin_status);** field_values.clear(); field_values.emplace_back("mtu", mtu); field_values.emplace_back("admin_status", admin_status); it->second = KeyOpFieldsValuesTuple{alias, SET_COMMAND, field_values}; it++; continue; }
In my case, the entry contains 11 fields, so the code write same entry 11 times, then xvcrd receive 11 events: set event: {'admin_status': 'up', 'alias': 'Ethernet1/1', 'asic_port_name': 'Eth0-ASIC0', 'description': 'ARISTA01T2:Ethernet1', 'index': '0', 'lanes': '1,2,3,4', 'mtu': '9100', 'pfc_asym': 'off', 'role': 'Ext', 'speed': '40000', 'tpid': '0x8100', 'oper_status': 'up', 'key': 'field'}
Also here are syslog show portmgrd update same entry multiples:
for (auto &entry : field_values) { writeConfigToAppDb(alias, fvField(entry), fvValue(entry)); SWSS_LOG_NOTICE("Configure %s %s to %s", alias.c_str(), fvField(entry).c_str(), fvValue(entry).c_str()); <== here is the log code. }
Jul 26 08:40:22.122569 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 alias to Ethernet1/4 Jul 26 08:40:22.128114 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 asic_port_name to Eth3-ASIC0 Jul 26 08:40:22.135013 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 description to Ethernet1/4 Jul 26 08:40:22.135013 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 index to 3 Jul 26 08:40:22.135013 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 lanes to 13,14,15,16 Jul 26 08:40:22.135724 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 pfc_asym to off Jul 26 08:40:22.135724 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 role to Ext Jul 26 08:40:22.135724 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 speed to 40000 Jul 26 08:40:22.135860 vlab-08 NOTICE swss0#portmgrd: :- doTask: Configure Ethernet12 tpid to 0x8100
We did look at it earlier. The question is why multiple updates are coming for a single change in config_db. portmgrd is writing to app_db. I don't think xcvrd is complaining about app_db update. Related to portsyncd, it is only during the first time init, portsyncd reads from config_db and write to app_db.
@prsunny , regarding to your question: The question is why multiple updates are coming for a single change in config_db
In the description of this issue, actually the print been added to this method: def handle_port_update_event(sel, asic_context, stop_event, logger, port_change_event_handler): Add a print here to output the 'fvp' values
So, the multiple update event is from appl_db, not from config_db, following code subscript the event:
def subscribe_port_update_event(namespaces):
port_tbl_map = [
{'APPL_DB': swsscommon.APP_PORT_TABLE_NAME},
{'STATE_DB': 'TRANSCEIVER_INFO'},
{'STATE_DB': 'PORT_TABLE'},
]
......
Also code here show the evenr is from appl_db: https://github.com/sonic-net/sonic-platform-daemons/blob/e3b03d4dda09c20cf96692a30b569986429ed2d4/sonic-xcvrd/xcvrd/xcvrd.py#L1235
sel, asic_context = port_mapping.subscribe_port_update_event(self.namespaces)
while not self.task_stopping_event.is_set():
# Handle port change event from main thread
port_mapping.handle_port_update_event(sel,
asic_context,
self.task_stopping_event,
helper_logger,
self.on_port_update_event)
Here is why single config_db change will trigger multiple appl_db event:
Following command will change config_db port table: redis-cli -n 4 hset "PORT|Ethernet0" "tx_power" "-13" This command is the single change.
Then the portmgrd are listening config_db port table set operation, and sync the change to appl_db: https://github.com/Azure/sonic-swss/blob/7126857c1e387784bd04b9f7502018b2a4973d85/cfgmgr/portmgr.cpp#L82
However, because the code issue in this method, a single change 1 step1 will trigger multiple change to appl_db:
for (auto &entry : field_values)
{
writeConfigToAppDb(alias, fvField(entry), fvValue(entry)); <== multiple change to application DB
SWSS_LOG_NOTICE("Configure %s %s to %s", alias.c_str(), fvField(entry).c_str(), fvValue(entry).c_str());
}
Then xvcrd are subscribe appl_db change here: https://github.com/sonic-net/sonic-platform-daemons/blob/e3b03d4dda09c20cf96692a30b569986429ed2d4/sonic-xcvrd/xcvrd/xcvrd_utilities/port_mapping.py#L108
def subscribe_port_update_event(namespaces):
port_tbl_map = [
{'APPL_DB': swsscommon.APP_PORT_TABLE_NAME},
{'STATE_DB': 'TRANSCEIVER_INFO'},
{'STATE_DB': 'PORT_TABLE'},
]
So xvcrd will get multiple change from appl_db.
thanks @liuh-80 for the explanation. Original issue was reported as multiple notifications for single update on config_db. Regarding app_db, it seems to be expected and @prgeor , i think this issue can be closed and not applicable to swss-common.
Close this issue because not a swss-common issue.
Xcvrd listens to configuration changes here. Steps to reproduce:
Set any field in PORT table of CONFIG_DB as:- redis-cli -n 4 hset "PORT|Ethernet0" "tx_power" "-13"
Add a print here to output the 'fvp' values and you find more than 6 events are received by Xcvrd
Clearly there are duplicates events pushed to Xcvrd which can impact performance of datapath initliazation especially during 400G link bring up sequence.