sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
718 stars 1.38k forks source link

[xcvrd] CmisManagerTask does not update port mapping #19588

Open Junchao-Mellanox opened 1 month ago

Junchao-Mellanox commented 1 month ago

Description

CmisManagerTask has a data member port_mapping which is used to store the mapping between logical port to (physical index, asic id). However, CmisManagerTask does not update port_mapping when port configuration is changed. It could cause an issue like this:

Jul 15 23:02:04.130203 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to KeyError(None)
Jul 15 23:02:04.132404 sonic ERR pmon#xcvrd: Traceback (most recent call last):
Jul 15 23:02:04.132404 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1627, in run
Jul 15 23:02:04.132404 sonic ERR pmon#xcvrd:     self.task_worker()
Jul 15 23:02:04.132792 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1292, in task_worker
Jul 15 23:02:04.132792 sonic ERR pmon#xcvrd:     port_mapping.handle_port_update_event(sel,
Jul 15 23:02:04.132792 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd_utilities/port_mapping.py", line 211, in handle_port_update_event
Jul 15 23:02:04.132825 sonic ERR pmon#xcvrd:     port_change_event_handler(port_change_event)
Jul 15 23:02:04.132912 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 893, in on_port_update_event
Jul 15 23:02:04.132912 sonic ERR pmon#xcvrd:     self.force_cmis_reinit(lport, 0)
Jul 15 23:02:04.132912 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1053, in force_cmis_reinit
Jul 15 23:02:04.133027 sonic ERR pmon#xcvrd:     self.update_port_transceiver_status_table_sw_cmis_state(lport, CMIS_STATE_INSERTED)
Jul 15 23:02:04.133027 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 835, in update_port_transceiver_status_table_sw_cmis_state
Jul 15 23:02:04.133027 sonic ERR pmon#xcvrd:     status_table = self.xcvr_table_helper.get_status_tbl(asic_index)
Jul 15 23:02:04.133056 sonic ERR pmon#xcvrd:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 2587, in get_status_tbl
Jul 15 23:02:04.133056 sonic ERR pmon#xcvrd:     return self.status_tbl[asic_id]
Jul 15 23:02:04.133083 sonic ERR pmon#xcvrd: KeyError: None
Jul 15 23:02:04.133098 sonic ERR pmon#xcvrd: Xcvrd: exception found at child thread CmisManagerTask due to KeyError(None)
Jul 15 23:02:04.133155 sonic ERR pmon#xcvrd: Exiting main loop as child thread raised exception!

Steps to reproduce the issue:

  1. Start xcvrd with CmisManagerTask enabled
  2. Wait until all ports are up
  3. Use dynamic port breakout feature to breakout a existing port

Describe the results you received:

xcvrd crashed

Describe the results you expected:

xcvrd should update the port mapping and not crash

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Junchao-Mellanox commented 1 month ago

@prgeor @mihirpat1 FYI.

vmittal-msft commented 1 month ago

@Junchao-Mellanox please share which SW version issue is seen as well as platform info ? @prgeor Did you get chance to check this ?

Junchao-Mellanox commented 1 month ago

Hi @vmittal-msft , it is based on 202311, the hash is 156b067c875967618232c02cb51b163e5e287e45 . And I think it is common for all platform.

ishidawataru commented 1 month ago

dup of https://github.com/sonic-net/sonic-buildimage/issues/18893?

If so, https://github.com/sonic-net/sonic-platform-daemons/pull/500 should fix the issue.