Open HaiHuang008 opened 6 months ago
@prgeor , can you please take a look?
Hi, I reproduced this issue on the VS environment.
The issue was that xcvrd's CmitManagerTask
was not taking care of port addition and removal events, that occur from DPB configuration change.
https://github.com/sonic-net/sonic-platform-daemons/pull/500 fixes the issue on my environment.
Since I don't have access to the real hardware that supports DPB and CMIS transceivers, I enhanced the VS environment to support DPB and run xcvrd
for reproduction and testing.
@HaiHuang008 It would be great if you could check whether this fix resolves the issue on your platform.
@ishidawataru Thanks for your fix, this error no longer appears. But another error appears, causing xcvrd to crash.
@HaiHuang008 Thanks for testing and I'm sorry that the original fix didn't work. I think I could fix the problem of the original fix. I updated the PR Could you try again?
@ishidawataru Thanks again. This issue is resolved. No xcvrd crash when port breakout. Check_port_breakout.txt
@HaiHuang008 Thanks for testing. I'll try to get the PR merged.
Description
The ports of the device are all connected to OSFP cables. The first time a port is breakout, xcvrd crashes out and then it initializes all ports again.
xcvrd crashes:
Due to xcvrd crashes, xcvrd is initialized again, and TRANSCEIVER INFO will be traversed and filled again. As a result, the Media type display of the current port is very slow (about two minutes) in show interface status, and it will also affect other ports.
Steps to reproduce the issue:
admin@sonic:~$ show interfaces breakout current-mode +-------------+-------------------------+ | Interface | Current Breakout Mode | +=============+=========================+ | Ethernet0 | 8x100G | +-------------+-------------------------+ | Ethernet8 | 2x400G | +-------------+-------------------------+ | Ethernet16 | 8x100G | +-------------+-------------------------+ | Ethernet24 | 2x400G | +-------------+-------------------------+ | Ethernet32 | 2x400G | +-------------+-------------------------+
May 7 18:48:08.371367 sonic WARNING pmon#xcvrd: $$$ Ethernet0 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/1', 'index': '1', 'lanes': '9', 'speed': '100000'} May 7 18:48:08.371500 sonic WARNING pmon#xcvrd: Ethernet0CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/1', 'index': '1', 'lanes': '9', 'speed': '100000', 'key': 'Ethernet0', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.371322 sonic INFO ConfigMgmt: Writing in Config DB May 7 18:48:08.371505 sonic INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet0': {'alias': 'Eth1/1', 'lanes': '9', 'speed': '100000', 'index': '1'}, 'Ethernet1': {'alias': 'Eth1/2', 'lanes': '10', 'speed': '100000', 'index': '1'}, 'Ethernet2': {'alias': 'Eth1/3', 'lanes': '11', 'speed': '100000', 'index': '1'}, 'Ethernet3': {'alias': 'Eth1/4', 'lanes': '12', 'speed': '100000', 'index': '1'}, 'Ethernet4': {'alias': 'Eth1/5', 'lanes': '13', 'speed': '100000', 'index': '1'}, 'Ethernet5': {'alias': 'Eth1/6', 'lanes': '14', 'speed': '100000', 'index': '1'}, 'Ethernet6': {'alias': 'Eth1/7', 'lanes': '15', 'speed': '100000', 'index': '1'}, 'Ethernet7': {'alias': 'Eth1/8', 'lanes': '16', 'speed': '100000', 'index': '1'}}} May 7 18:48:08.395578 sonic WARNING pmon#xcvrd: $$$ Ethernet1 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/2', 'index': '1', 'lanes': '10', 'speed': '100000'} May 7 18:48:08.395578 sonic WARNING pmon#xcvrd: $$$ Ethernet2 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/3', 'index': '1', 'lanes': '11', 'speed': '100000'} May 7 18:48:08.395578 sonic WARNING pmon#xcvrd: $$$ Ethernet3 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/4', 'index': '1', 'lanes': '12', 'speed': '100000'} May 7 18:48:08.395578 sonic WARNING pmon#xcvrd: $$$ Ethernet4 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/5', 'index': '1', 'lanes': '13', 'speed': '100000'} May 7 18:48:08.395578 sonic WARNING pmon#xcvrd: $$$ Ethernet5 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/6', 'index': '1', 'lanes': '14', 'speed': '100000'} May 7 18:48:08.395578 sonic WARNING pmon#xcvrd: $$$ Ethernet6 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/7', 'index': '1', 'lanes': '15', 'speed': '100000'} May 7 18:48:08.395630 sonic WARNING pmon#xcvrd: $$$ Ethernet7 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'alias': 'Eth1/8', 'index': '1', 'lanes': '16', 'speed': '100000'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet1CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/2', 'index': '1', 'lanes': '10', 'speed': '100000', 'key': 'Ethernet1', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet2CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/3', 'index': '1', 'lanes': '11', 'speed': '100000', 'key': 'Ethernet2', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet3CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/4', 'index': '1', 'lanes': '12', 'speed': '100000', 'key': 'Ethernet3', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet4CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/5', 'index': '1', 'lanes': '13', 'speed': '100000', 'key': 'Ethernet4', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet5CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/6', 'index': '1', 'lanes': '14', 'speed': '100000', 'key': 'Ethernet5', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet6CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/7', 'index': '1', 'lanes': '15', 'speed': '100000', 'key': 'Ethernet6', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.395892 sonic WARNING pmon#xcvrd: Ethernet7CONFIG_DBPORT handle_port_update_event() fvp {'alias': 'Eth1/8', 'index': '1', 'lanes': '16', 'speed': '100000', 'key': 'Ethernet7', 'asic_id': 0, 'op': 'SET'} May 7 18:48:08.403956 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to KeyError(None) May 7 18:48:08.404851 sonic ERR pmon#xcvrd: Traceback (most recent call last): May 7 18:48:08.404851 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1718, in run May 7 18:48:08.404851 sonic ERR pmon#xcvrd: self.task_worker() May 7 18:48:08.404851 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1428, in task_worker May 7 18:48:08.404888 sonic ERR pmon#xcvrd: self.port_dict[lport]['host_tx_ready'] = self.get_host_tx_status(lport) May 7 18:48:08.404888 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1300, in get_host_tx_status May 7 18:48:08.405015 sonic ERR pmon#xcvrd: state_port_tbl = self.xcvr_table_helper.get_state_port_tbl(asic_index) May 7 18:48:08.405015 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 2632, in get_state_port_tbl May 7 18:48:08.405015 sonic ERR pmon#xcvrd: return self.state_port_tbl[asic_id] May 7 18:48:08.405015 sonic ERR pmon#xcvrd: KeyError: None May 7 18:48:08.405015 sonic ERR pmon#xcvrd: Xcvrd: exception found at child thread CmisManagerTask due to KeyError(None) May 7 18:48:08.405033 sonic ERR pmon#xcvrd: Exiting main loop as child thread raised exception! May 7 18:48:08.414110 sonic INFO pmon#supervisord 2024-05-07 18:48:08,413 INFO exited: xcvrd (terminated by SIGKILL; not expected)
May 7 18:48:09.540293 sonic NOTICE pmon#xcvrd[4227]: Starting up... May 7 18:48:09.540293 sonic NOTICE pmon#xcvrd[4227]: XCVRD INIT: Start daemon init... May 7 18:48:09.670233 sonic NOTICE pmon#xcvrd[4227]: XCVRD INIT: Wait for port config is done May 7 18:48:09.673361 sonic NOTICE pmon#xcvrd[4227]: XCVRD INIT: After port config is done May 7 18:48:09.679986 sonic NOTICE pmon#xcvrd[4227]: Start daemon main loop with thread count 3 May 7 18:48:09.680067 sonic NOTICE pmon#xcvrd[4227]: Started thread CmisManagerTask May 7 18:48:09.680067 sonic NOTICE pmon#xcvrd[4227]: Started thread DomInfoUpdateTask May 7 18:48:09.680077 sonic NOTICE pmon#xcvrd[4227]: Started thread SfpStateUpdateTask
admin@sonic:~$ show version
SONiC Software Version: SONiC.DS4101_V2.0.0_1 SONiC OS Version: 11 Distribution: Debian 11.7 Kernel: 5.10.0-23-2-amd64 Build commit: 67ede2e82 Build date: Wed Apr 10 07:05:46 UTC 2024 Built by: autobuild@AZUHPSP10
Platform: x86_64-cel_ds4101-r0 HwSKU: cel_ds4101 ASIC: broadcom ASIC Count: 1 Serial Number: R4030F2B063A25GD200026 Model Number: R4030-F9001-02 Hardware Revision: 6 Uptime: 19:26:13 up 1:38, 2 users, load average: 1.45, 1.52, 1.45 Date: Tue 07 May 2024 19:26:13
Docker images: REPOSITORY TAG IMAGE ID SIZE docker-dhcp-relay latest 43f5d92d0d5f 309MB docker-macsec latest 063cb43bd118 328MB docker-orchagent DS4101_V2.0.0_1 88f1d11531ea 338MB docker-orchagent latest 88f1d11531ea 338MB docker-fpm-frr DS4101_V2.0.0_1 25fef5a4a745 357MB docker-fpm-frr latest 25fef5a4a745 357MB docker-sflow DS4101_V2.0.0_1 7f2b88118acd 327MB docker-sflow latest 7f2b88118acd 327MB docker-nat DS4101_V2.0.0_1 da64867dabff 329MB docker-nat latest da64867dabff 329MB docker-eventd DS4101_V2.0.0_1 1a953e81a85b 300MB docker-eventd latest 1a953e81a85b 300MB docker-snmp DS4101_V2.0.0_1 9d41791169a6 340MB docker-snmp latest 9d41791169a6 340MB docker-platform-monitor DS4101_V2.0.0_1 b1fc016974ff 421MB docker-platform-monitor latest b1fc016974ff 421MB docker-teamd DS4101_V2.0.0_1 d97ecda1ae11 326MB docker-teamd latest d97ecda1ae11 326MB docker-lldp DS4101_V2.0.0_1 0e05f15eea2b 343MB docker-lldp latest 0e05f15eea2b 343MB docker-mux DS4101_V2.0.0_1 252b2ba2def7 349MB docker-mux latest 252b2ba2def7 349MB docker-sonic-gnmi DS4101_V2.0.0_1 2a40110cc077 388MB docker-sonic-gnmi latest 2a40110cc077 388MB docker-database DS4101_V2.0.0_1 79d18c8282dd 301MB docker-database latest 79d18c8282dd 301MB docker-gbsyncd-broncos DS4101_V2.0.0_1 5cea29cd8028 351MB docker-gbsyncd-broncos latest 5cea29cd8028 351MB docker-syncd-brcm DS4101_V2.0.0_1 7e5381b11d8f 675MB docker-syncd-brcm latest 7e5381b11d8f 675MB docker-gbsyncd-credo DS4101_V2.0.0_1 c27a150fd640 323MB docker-gbsyncd-credo latest c27a150fd640 323MB docker-router-advertiser DS4101_V2.0.0_1 28bbd623ce8a 301MB docker-router-advertiser latest 28bbd623ce8a 301MB docker-sonic-mgmt-framework DS4101_V2.0.0_1 303355e5ea69 417MB docker-sonic-mgmt-framework latest 303355e5ea69 417MB
admin@sonic:~$
(paste your output here or download and attach the file here )