Closed ysmanman closed 1 week ago
@arlakshm @kenneth-arista
Looked at BRCM SAI code and it seems neither 202205 nor 202405 SAI supports setting SAI_NEIGHBOR_ENTRY_ATTR_IS_LOCAL
for neighbor entry. But we didn't see the failure in 202205 testing. Maybe SONiC behaviors in 202205 and 202405 are different.
FYI, CSP CS00012298563 confirmed BRCM SAI did not support setting SAI_NEIGHBOR_ENTRY_ATTR_IS_LOCAL
in SAI 9.x (or maybe earlier version too).
Thanks @ysmanman for reporting this issue. @saksarav-nokia, @mlok-nokia for viz..
@ysmanman, I did a quick check on the SAI definition. This attribute supports create and set. Any reason why the SAI behavior was changed?
Hi @arlakshm , I don't have too much context on why BRCM discontinued supporting setting SAI_NEIGHBOR_ENTRY_ATTR_IS_LOCAL
at least starting from SAI 9.2. But based on the conversion in CSP CS00012298563, there were some discussion between MSFT and BRCM as well. Quote the reply from BRCM:
update:
at the meeting with MSFT, they are asking if SAI9.x supports SAI_NEIGHBOR_ENTRY_ATTR_IS_LOCAL on brcm_sai_set_neighbor_entry_attribute() .
Answer: it is not supported on SAI 9.x```
Opened CSP CS00012360402 to track the issue.
The SONiC behavior changed between 202205 and 202405. Specifically, https://github.com/sonic-net/sonic-swss/pull/2577 fixed applying all neighbor attributes, which exposed this problem in the DNX SAI.
Hi @vmittal-msft, we have noticed this same failure in release 202305. Once fixed, do you know if it will be backported to the affected releases? Thanks.
Jul 17 15:32:56.950091 xx119 ERR syncd#syncd: [none] SAI_API_NEIGHBOR:brcm_sai_set_neighbor_entry_attribute:597 Error processing nbr entry attribute failed with error Unknown error (0xfffd0000).
Jul 17 15:32:56.950091 xx119 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_FAILURE
Jul 17 15:32:56.950091 xx119 ERR syncd#syncd: :- processQuadEvent: attr: SAI_NEIGHBOR_ENTRY_ATTR_NO_HOST_ROUTE: true
Jul 17 15:32:56.950262 xx119 ERR swss#orchagent: :- set: set status: SAI_STATUS_FAILURE
Jul 17 15:32:56.950286 xx119 ERR swss#orchagent: :- addNeighbor: Failed to update neighbor xx:xx:xx:xx:xx:xx on Ethernet49, attr.id=0x3, rv:-1
Jul 17 15:32:56.950286 xx119 ERR swss#orchagent: :- handleSaiSetStatus: Encountered failure in set operation, exiting orchagent, SAI API: SAI_API_NEIGHBOR, status: SAI_STATUS_FAILURE
Jul 17 15:32:56.950295 xx119 NOTICE swss#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP
Jul 17 15:32:56.950498 xx119 NOTICE syncd#syncd: :- processNotifySyncd: Invoking SAI failure dump
Jul 17 15:32:56.955937 xx119 NOTICE swss#orchagent: :- sai_redis_notify_syncd: invoked DUMP succeeded
Jul 17 15:32:57.717095 xx119 INFO swss#supervisord 2024-07-17 15:32:57,716 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Jul 17 15:32:58.721233 xx119 INFO swss#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'
Jul 17 15:32:58.721364 xx119 NOTICE swss#supervisor-proc-exit-listener: :- publish: EVENT_PUBLISHED: {"sonic-events-host:process-exited-unexpectedly":{"ctr_name":"swss","process_name":"orchagent","timestamp":"2024-07-17T15:32:58.721202Z"}}
Jul 17 15:32:58.723282 xx119 INFO swss#supervisord 2024-07-17 15:32:58,722 WARN received SIGTERM indicating exit request
Jul 17 15:32:58.723282 xx119 INFO swss#supervisord 2024-07-17 15:32:58,722 INFO waiting for supervisor-proc-exit-listener, rsyslogd, portsyncd, coppmgrd, arp_update, ndppd, neighsyncd, vlanmgrd, intfmgrd, portmgrd, buffermgrd, vrfmgrd, nbrmgrd, vxlanmgrd, fdbsyncd, tunnelmgrd to die
Jul 17 15:32:58.723763 xx119 INFO swss#supervisord 2024-07-17 15:32:58,723 INFO stopped: tunnelmgrd (terminated by SIGTERM)
Jul 17 15:32:58.724977 xx119 INFO swss#supervisord 2024-07-17 15:32:58,724 INFO stopped: fdbsyncd (terminated by SIGTERM)
Jul 17 15:32:58.726371 xx119 INFO swss#supervisord 2024-07-17 15:32:58,725 INFO stopped: vxlanmgrd (terminated by SIGTERM)
Jul 17 15:32:58.727726 xx119 INFO swss#supervisord 2024-07-17 15:32:58,727 INFO stopped: nbrmgrd (terminated by SIGTERM)
Jul 17 15:32:58.729011 xx119 INFO swss#supervisord 2024-07-17 15:32:58,728 INFO stopped: vrfmgrd (terminated by SIGTERM)
Jul 17 15:32:59.731800 xx119 INFO swss#supervisord 2024-07-17 15:32:59,731 INFO stopped: buffermgrd (terminated by SIGTERM)
Jul 17 15:32:59.732826 xx119 INFO swss#supervisord 2024-07-17 15:32:59,732 INFO stopped: portmgrd (terminated by SIGTERM)
Jul 17 15:32:59.734000 xx119 INFO swss#supervisord 2024-07-17 15:32:59,733 INFO stopped: intfmgrd (terminated by SIGTERM)
Jul 17 15:32:59.735050 xx119 INFO swss#supervisord 2024-07-17 15:32:59,734 INFO stopped: vlanmgrd (terminated by SIGTERM)
Jul 17 15:33:00.737843 xx119 INFO swss#supervisord 2024-07-17 15:33:00,737 INFO stopped: neighsyncd (terminated by SIGTERM)
Jul 17 15:33:00.737843 xx119 INFO swss#supervisord: message repeated 10 times: [ orchagent ]
Jul 17 15:33:00.737843 xx119 INFO swss#supervisord: ndppd (error) Shutting down...
Jul 17 15:33:00.737884 xx119 INFO swss#supervisord: ndppd (notice) Bye
Jul 17 15:33:00.738500 xx119 INFO swss#supervisord 2024-07-17 15:33:00,738 INFO stopped: ndppd (exit status 0)
Jul 17 15:33:01.740548 xx119 INFO swss#supervisord 2024-07-17 15:33:01,739 INFO stopped: arp_update (terminated by SIGTERM)
Jul 17 15:33:01.740548 xx119 INFO swss#supervisord 2024-07-17 15:33:01,740 INFO waiting for supervisor-proc-exit-listener, rsyslogd, portsyncd, coppmgrd to die
Jul 17 15:33:01.741848 xx119 INFO swss#supervisord 2024-07-17 15:33:01,741 INFO stopped: coppmgrd (terminated by SIGTERM)
Jul 17 15:33:03.745768 xx119 INFO swss#supervisord 2024-07-17 15:33:03,745 INFO stopped: portsyncd (terminated by SIGTERM
)
$ bcmcmd "bsv"
bsv
BRCM SAI ver: [8.4.39.2], OCP SAI ver: [1.11.0], SDK ver: [sdk-6.5.27] CANCUN ver: [06.12.00]
drivshell>
$
The fix from Broadcom is available in DNX SAI 11.2.7.1
Description
We noticed following orchagent failure in T2 testing with 202405 image.
The failure was observed with
arp/test_neighbor_mac_noptf.py
andarp/test_arpall.py
.Steps to reproduce the issue:
1. 2. 3.
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):