sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

Orchagent exiting due to unsupported SAI_*_ATTR_SELECTIVE_COUNTER_LIST attr #20725

Open bofish-arista opened 1 day ago

bofish-arista commented 1 day ago

Description

Sonic-buildimage PR#20540 has incorporated SAI changes which includes latest sairedis, this in turn features changes to SAI, including SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST, which does not appear to be supported yet by Broadcom.

As a result of this change, orchagent is exiting early in startup process.

Relevant links: https://github.com/sonic-net/sonic-buildimage/pull/20540 https://github.com/sonic-net/sonic-sairedis/pull/1431 https://github.com/opencomputeproject/SAI/pull/1941

Steps to reproduce the issue:

Issue is seen during inialization of a standalone device.

Describe the results you received:

Orchagent exited with runtime error logged as shown below:

2024 Nov 4 17:44:07.612509 up500 ERR syncd#syncd: :- run: Runtime error: :- discover: when query SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST (on SAI_OBJECT_TYPE_PORT RID oid:0x100000001) got value oid:0x7ffc4a7b45f0 objectTypeQuery returned NULL object type

Describe the results you expected:

Output of show version:

root@up322:~# show version

SONiC Software Version: SONiC.branch.master-ars.7cd2518e-buildimage.origin.master-nightly-slim-2024.10.31.20.12 SONiC OS Version: 12 Distribution: Debian 12.7 Kernel: 6.1.0-22-2-amd64 Build commit: 7f44814d7 Build date: Fri Nov 1 00:59:18 UTC 2024 Built by: jenkins@jenkins-arsonic-k8s-1-vfqtx

Platform: x86_64-arista_7060_cx32s HwSKU: Arista-7060CX-32S-C32 ASIC: broadcom ASIC Count: 1 Serial Number: SGD20254417 Model Number: DCS-7060CX-32S Hardware Revision: 03.00 Uptime: 06:20:23 up 9 min, 2 users, load average: 0.26, 0.98, 0.75 Date: Sat 02 Nov 2024 06:20:23

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

kcudnik commented 3 hours ago

Hi:

f23185d5 (Rajkumar-Marvell         2024-10-07 11:55:21 +0530 2627)     SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST,

that attribute was recently added in 2024/10/7

SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST (on SAI_OBJECT_TYPE_PORT RID oid:0x20100000000) got value oid:0x559ac8beda80 objectTypeQuery returned NULL object type

this attribute is LIST of counters, and it returned some OID value 0x559ac8beda80 on which OT returned NULL, and it should be SAI_OBJECT_TYPE_COUNTER as specified in SAI headers saiport.h, if it's returning NULL, that's a vendor bug

and currently syncd on such error crashes, since we got OID but we don't know what type it is, this is invalid

so what should happened, when we have newer headers in SAI sairedis/syncd and we are using older vendor SAI, all unsupported attributes should return not implemented or not supported, instead of success, but it seams that vendor is returning some invalid value and success on this attribute

error comes from here: https://github.com/sonic-net/sonic-sairedis/blob/master/syncd/SaiDiscovery.cpp#L237

error could be also caused by DASH extensions (if vendor support DASH) since extensions range changed, and that change is not backward compatible: https://github.com/opencomputeproject/SAI/pull/2028

so what i expect is happening, vendor have some custom/private attribute after SAI_PORT_ATTR_END, on older version of SAI headers which have the same enum value as SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST, which causes syncd think that SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST is implemented when actually this is private internal attribute

@bofish-arista can you confirm ?