ossobv / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
0 stars 0 forks source link

runtime: (minor) Sometimes "Type" in interfaces status (from eeprom) is not populated #45

Open wdoekes opened 1 day ago

wdoekes commented 1 day ago

Description

root@spine2:0:~# show interface status Ethernet0-8

  Interface                    Lanes    Speed    MTU    FEC        Alias    Vlan    Oper    Admin    Type    Asym PFC
-----------  -----------------------  -------  -----  -----  -----------  ------  ------  -------  ------  ----------
  Ethernet0  73,74,75,76,77,78,79,80     400G   9100     rs  Eth1(Port1)  routed      up       up     N/A         N/A
  Ethernet8  65,66,67,68,69,70,71,72     400G   9100     rs  Eth2(Port2)  routed      up       up     N/A         N/A

vs

  Interface                            Lanes    Speed    MTU    FEC              Alias    Vlan    Oper    Admin                                             Type    Asym PFC
-----------  -------------------------------  -------  -----  -----  -----------------  ------  ------  -------  -----------------------------------------------  ----------
  Ethernet0          73,74,75,76,77,78,79,80     400G   9100     rs          Ethernet0  routed      up       up  QSFP-DD Double Density 8X Pluggable Transceiver         N/A
  Ethernet8          65,66,67,68,69,70,71,72     400G   9100     rs   fourHundredGigE2  routed      up       up  QSFP-DD Double Density 8X Pluggable Transceiver         N/A

While the values are available:

root@spine2:0:~# hd /sys/bus/i2c/devices/25-0050/eeprom  | grep QDD
00000090  20 64 9d 99 51 44 44 2d  34 30 30 47 2d 50 43 30  | d..QDD-400G-PC0|
000000e0  31 31 30 35 34 33 46 53  51 44 44 2d 34 30 30 47  |110543FSQDD-400G|

But transceivers eeprom does not think so:

root@spine2:0:~# show interfaces transceiver eeprom Ethernet0
Ethernet0: SFP EEPROM Not detected

Might be related to:

2024 Nov 21 13:23:37.946372 spine2 ERR pmon#xcvrd[29]: Xcvrd: exception found at child thread CmisManagerTask due to KeyError(None)
2024 Nov 21 13:23:37.946417 spine2 ERR pmon#xcvrd[29]: Exiting main loop as child thread raised exception!
2024 Nov 21 13:23:39.332537 spine2 ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to KeyError(None)
2024 Nov 21 13:23:39.334767 spine2 ERR pmon#xcvrd: Traceback (most recent call last):
2024 Nov 21 13:23:39.334807 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 1509, in run
2024 Nov 21 13:23:39.334807 spine2 ERR pmon#xcvrd:     self.task_worker()
2024 Nov 21 13:23:39.334855 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 1167, in task_worker
2024 Nov 21 13:23:39.334855 spine2 ERR pmon#xcvrd:     port_change_observer.handle_port_update_event()
2024 Nov 21 13:23:39.334855 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd_utilities/port_event_helper.py", line 200, in handle_port_update_event
2024 Nov 21 13:23:39.334915 spine2 ERR pmon#xcvrd:     self.port_change_event_handler(port_change_event)
2024 Nov 21 13:23:39.334915 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 741, in on_port_update_event
2024 Nov 21 13:23:39.334915 spine2 ERR pmon#xcvrd:     self.force_cmis_reinit(lport, 0)
2024 Nov 21 13:23:39.334915 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 901, in force_cmis_reinit
2024 Nov 21 13:23:39.334989 spine2 ERR pmon#xcvrd:     self.update_port_transceiver_status_table_sw_cmis_state(lport, CMIS_STATE_INSERTED)
2024 Nov 21 13:23:39.334989 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 683, in update_port_transceiver_status_table_sw_cmis_state
2024 Nov 21 13:23:39.334989 spine2 ERR pmon#xcvrd:     status_table = self.xcvr_table_helper.get_status_tbl(asic_index)
2024 Nov 21 13:23:39.334989 spine2 ERR pmon#xcvrd:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024 Nov 21 13:23:39.335013 spine2 ERR pmon#xcvrd:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd_utilities/xcvr_table_helper.py", line 53, in get_status_tbl
2024 Nov 21 13:23:39.335013 spine2 ERR pmon#xcvrd:     return self.status_tbl[asic_id]
2024 Nov 21 13:23:39.335067 spine2 ERR pmon#xcvrd:            ~~~~~~~~~~~~~~~^^^^^^^^^
2024 Nov 21 13:23:39.335067 spine2 ERR pmon#xcvrd: KeyError: None
2024 Nov 21 13:23:39.335177 spine2 ERR pmon#xcvrd[54]: Xcvrd: exception found at child thread CmisManagerTask due to KeyError(None)
2024 Nov 21 13:23:39.335255 spine2 ERR pmon#xcvrd[54]: Exiting main loop as child thread raised exception!
2024 Nov 21 13:23:40.018515 spine2 ERR snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.11/dist-packages/ax_interface/mib.py", line 48, in start#012    self.update_data()#012  File "/usr/local/lib/python3.11/dist-packages/sonic_ax_impl/mibs/vendor/cisco/ciscoSwitchQosMIB.py", line 105, in update_data#012    namespace = self.port_index_namespace[int(port_index)]#012                ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^#012KeyError: 251

Could be because of Breakout switch?

Restarting pmon did not help...

2024-11-21 13:34:27,983 WARN exited: xcvrd (terminated by SIGKILL; not expected)
2024-11-21 13:34:29,007 INFO spawned: 'xcvrd' with pid 54
2024-11-21 13:34:29,374 WARN exited: xcvrd (terminated by SIGKILL; not expected)
2024-11-21 13:34:31,419 INFO spawned: 'xcvrd' with pid 62
2024-11-21 13:34:31,780 WARN exited: xcvrd (terminated by SIGKILL; not expected)
2024-11-21 13:34:34,812 INFO spawned: 'xcvrd' with pid 70
2024-11-21 13:34:35,166 WARN exited: xcvrd (terminated by SIGKILL; not expected)
2024-11-21 13:34:35,166 INFO gave up: xcvrd entered FATAL state, too many start retries too quickly
2024-11-21 13:34:38,239 INFO success: psud entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2024-11-21 13:34:38,239 INFO success: syseepromd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

Some careful restarting of pmon did help in the end.

root@spine2:130:~# pmon.sh stop

(waaait a bit)

root@spine2:0:~# pmon.sh start
Starting existing pmon container with HWSKU Accton-AS9716-32D

root@spine2:0:~# docker logs pmon -f
...
2024-11-21 13:37:23,996 INFO success: xcvrd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
...

root@spine2:0:~# show interface transceiver eeprom Ethernet0
Ethernet0: SFP EEPROM detected
        Active Firmware: N/A
        Active application selected code assigned to host lane 1: N/A

Which build are we running (if any)

SONiC Software Version: SONiC.osso202405.0-439acd33c
SONiC OS Version: 12
Distribution: Debian 12.8
Kernel: 6.1.0-22-2-amd64
Build commit: 439acd33c
Build date: Wed Nov 20 22:41:18 UTC 2024
Built by: sonic-builder@dev.osso.nl

Platform: x86_64-accton_as9716_32d-r0
HwSKU: Accton-AS9716-32D
ASIC: broadcom
ASIC Count: 1

Upstream issues/PRs