sonic-net / sonic-platform-common

Python packages which provide a common interface to platform-specific hardware peripherals in SONiC
Other
46 stars 175 forks source link

xcvrd crashes in SFP refactored code. #255

Closed aravindmani-1 closed 2 years ago

aravindmani-1 commented 2 years ago

In the latest master image, xcvrd crashes while trying to get transceiver info for 400G DAC. root@sonic:~# show ver

SONiC Software Version: SONiC.master.60307-dirty-20211220.184919 Distribution: Debian 11.2 Kernel: 5.10.0-8-2-amd64 Build commit: 0d327abe9 Build date: Mon Dec 20 18:56:46 UTC 2021 Built by: AzDevOps@sonic-build-workers-00101A

Platform: x86_64-dellemc_z9332f_d1508-r0 HwSKU: DellEMC-Z9332f-O32 ASIC: broadcom ASIC Count: 1 Serial Number: 1QHXCW2 Model Number: 04CN21 Hardware Revision: A00 Uptime: 09:35:14 up 24 min, 1 user, load average: 0.68, 0.63, 0.54

root@sonic:~# python3 Python 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] on linux Type "help", "copyright", "credits" or "license" for more information.

from sonic_platform.chassis import Chassis c=Chassis() c.get_sfp(1).get_transceiver_info() Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/sfp_optoe_base.py", line 24, in get_transceiver_info return api.get_transceiver_info() if api is not None else None File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/api/public/cmis.py", line 121, in get_transceiver_info admin_info = self.xcvr_eeprom.read(consts.ADMIN_INFO_FIELD) File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/xcvr_eeprom.py", line 30, in read return field.decode(raw_data, decoded_deps) File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 252, in decode result[field.name] = field.decode(raw_data[offset - start: offset + field.get_size() - start], File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 273, in decode date = super(DateField, self).decode(raw_data, decoded_deps) File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 191, in decode return struct.unpack(self.format, raw_data)[0].decode(self.encoding) UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: ordinal not in range(128)

aravindmani-1 commented 2 years ago

@prgeor Could you please check this issue?.

prgeor commented 2 years ago

@andywongarista could you take a look if possible?

andywongarista commented 2 years ago

I haven't been able to reproduce this error exactly but I'm encountering a different error:

SONiC Software Version: SONiC.branch.master-ars.98914b08-buildimage.3855ce284-nightly-2021.12.23.18.34
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: 9c0ebe7ae
Build date: Thu Dec 23 18:43:55 UTC 2021
Built by: jenkins@jenkins-arsonic-2-vb59s

Platform: x86_64-arista_7060dx4_32
HwSKU: Arista-7060DX4-C32
ASIC: broadcom
ASIC Count: 1
Serial Number: JPE21204635
Model Number: DCS-7060DX4-32
Hardware Revision: 04.00
Uptime: 23:13:28 up  3:47,  2 users,  load average: 0.23, 0.22, 0.27

>>> from sonic_platform.chassis import Chassis
>>> c=Chassis()
>>> c.get_sfp(1).get_transceiver_info()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/sfp_optoe_base.py", line 24, in get_transceiver_info
    return api.get_transceiver_info() if api is not None else None
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/api/public/cmis.py", line 121, in get_transceiver_info
    admin_info = self.xcvr_eeprom.read(consts.ADMIN_INFO_FIELD)
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/xcvr_eeprom.py", line 30, in read
    return field.decode(raw_data, **decoded_deps)
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 252, in decode
    result[field.name] = field.decode(raw_data[offset - start: offset + field.get_size() - start],
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 252, in decode
    result[field.name] = field.decode(raw_data[offset - start: offset + field.get_size() - start],
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 203, in decode
    code = struct.unpack(self.format, raw_data)[0]
struct.error: unpack requires a buffer of 1 bytes

I believe the inclusion of page 01 fields in the ADMIN_INFO_FIELD group is causing this error for 400G DAC, e.g. https://github.com/Azure/sonic-platform-common/blob/77da9c8440dda385db745c070453ff343acc1eac/sonic_platform_base/sonic_xcvr/mem_maps/public/cmis.py#L64

aravindmani-1 commented 2 years ago

I haven't been able to reproduce this error exactly but I'm encountering a different error:

SONiC Software Version: SONiC.branch.master-ars.98914b08-buildimage.3855ce284-nightly-2021.12.23.18.34
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: 9c0ebe7ae
Build date: Thu Dec 23 18:43:55 UTC 2021
Built by: jenkins@jenkins-arsonic-2-vb59s

Platform: x86_64-arista_7060dx4_32
HwSKU: Arista-7060DX4-C32
ASIC: broadcom
ASIC Count: 1
Serial Number: JPE21204635
Model Number: DCS-7060DX4-32
Hardware Revision: 04.00
Uptime: 23:13:28 up  3:47,  2 users,  load average: 0.23, 0.22, 0.27

>>> from sonic_platform.chassis import Chassis
>>> c=Chassis()
>>> c.get_sfp(1).get_transceiver_info()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/sfp_optoe_base.py", line 24, in get_transceiver_info
    return api.get_transceiver_info() if api is not None else None
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/api/public/cmis.py", line 121, in get_transceiver_info
    admin_info = self.xcvr_eeprom.read(consts.ADMIN_INFO_FIELD)
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/xcvr_eeprom.py", line 30, in read
    return field.decode(raw_data, **decoded_deps)
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 252, in decode
    result[field.name] = field.decode(raw_data[offset - start: offset + field.get_size() - start],
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 252, in decode
    result[field.name] = field.decode(raw_data[offset - start: offset + field.get_size() - start],
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/sonic_xcvr/fields/xcvr_field.py", line 203, in decode
    code = struct.unpack(self.format, raw_data)[0]
struct.error: unpack requires a buffer of 1 bytes

I believe the inclusion of page 01 fields in the ADMIN_INFO_FIELD group is causing this error for 400G DAC, e.g.

https://github.com/Azure/sonic-platform-common/blob/77da9c8440dda385db745c070453ff343acc1eac/sonic_platform_base/sonic_xcvr/mem_maps/public/cmis.py#L64

Yes. I could able to repro the same issue as well.

prgeor commented 2 years ago

issue is now fixed