sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

SONiC swss and syncd Exited in brcm when create a subinterface #6167

Open mipxman opened 3 years ago

mipxman commented 3 years ago

Hello Guys !

I try to create subinterface (subport) but after creating my switch was crashed !

SONiC Software Version: SONiC.master.0-dirty-20201104.075431 Distribution: Debian 10.6 Kernel: 4.19.0-9-2-amd64 Build commit: 3a4435eb ASIC: broadcom (tomahawk)

I use this command to create subinterface config interface ip add EthernetX.Y A.B.C.D/M

 #show subinterface status
  Sub port interface    Speed    MTU    Vlan    Admin                  Type
--------------------  -------  -----  ------  -------  --------------------
      Ethernet104.10      40G   9100      10       up  802.1q-encapsulation

you can see my cat syslog | grep Ethernet104 log.txt

prsunny commented 3 years ago

@wendani , could you comment?

prsunny commented 3 years ago

@wendani , could you comment?

mipxman commented 3 years ago

also when I send $ bcmcmd "Switch Control" | grep Subport show this result :

UnkownSubportPktTagToCpu    Feature unavilable
SubportPktTagEthertype      Feature unavilable
SubportPktTagToCpu      Feature unavilable
NonSubportPktTagToCpu       Feature unavilable
SubportCoEEtherType     Feature unavilable
SubportEgressWideTpid       Feature unavilable
SubportEgressWideTpid       Feature unavilable
SubportLowPriPfcSel     Feature unavilable
SubportHighPriPfcSel        Feature unavilable
anshuv-mfst commented 3 years ago

@wendani - Could you please look into this issue. Thanks!

wendani commented 3 years ago

log.txt misses the line complaining process exit. Can you attach a complete one?

mipxman commented 3 years ago

log.txt misses the line complaining process exit. Can you attach a complete one? Ethernet8.100 and Ethernet8.200 is subinterface syslog.txt

wendani commented 3 years ago

Can you fix the following issue fist? This is not related to sub interface.

Nov 29 07:25:18.882050 sonic ERR syncd#syncd: [0] SAI_API_SWITCH:brcm_sai_get_switch_attribute:3947 Error retreiving system mac.
Nov 29 07:25:18.882050 sonic ERR syncd#syncd: :- saiGetMacAddress: failed to get mac address: SAI_STATUS_ITEM_NOT_FOUND
Nov 29 07:25:18.882050 sonic NOTICE syncd#syncd: :- SaiSwitch: constructor took 2.113590 sec
Nov 29 07:25:18.883292 sonic NOTICE syncd#syncd: :- hardReinit: hard reinit took 6.193822 sec
Nov 29 07:25:18.887299 sonic NOTICE syncd#syncd: :- onSyncdStart: on syncd start took 6.320957 sec
Nov 29 07:25:18.887388 sonic ERR syncd#syncd: :- run: Runtime error during syncd init: :- saiGetMacAddress: failed to get mac address: SAI_STATUS_ITEM_NOT_FOUND
Nov 29 07:25:18.887447 sonic NOTICE syncd#syncd: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x0
Nov 29 07:25:18.888030 sonic NOTICE syncd#syncd: :- sendShutdownRequestAfterException: notification send successfull
mipxman commented 3 years ago

Can you fix the following issue fist? This is not related to sub interface.

Nov 29 07:25:18.882050 sonic ERR syncd#syncd: [0] SAI_API_SWITCH:brcm_sai_get_switch_attribute:3947 Error retreiving system mac.
Nov 29 07:25:18.882050 sonic ERR syncd#syncd: :- saiGetMacAddress: failed to get mac address: SAI_STATUS_ITEM_NOT_FOUND
Nov 29 07:25:18.882050 sonic NOTICE syncd#syncd: :- SaiSwitch: constructor took 2.113590 sec
Nov 29 07:25:18.883292 sonic NOTICE syncd#syncd: :- hardReinit: hard reinit took 6.193822 sec
Nov 29 07:25:18.887299 sonic NOTICE syncd#syncd: :- onSyncdStart: on syncd start took 6.320957 sec
Nov 29 07:25:18.887388 sonic ERR syncd#syncd: :- run: Runtime error during syncd init: :- saiGetMacAddress: failed to get mac address: SAI_STATUS_ITEM_NOT_FOUND
Nov 29 07:25:18.887447 sonic NOTICE syncd#syncd: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x0
Nov 29 07:25:18.888030 sonic NOTICE syncd#syncd: :- sendShutdownRequestAfterException: notification send successfull

I try to fix it and reported the result here. tnx

TETA-Net commented 3 years ago

@wendani
I try to solve the problem you mention above. but the problem was not solved. can you check the syslog ? afterlog.txt

Mar 10 12:20:37.657659 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4 admin:1 oper:0 addr:6c:ec:5a:0a:52:cf ifindex:490 master:0
Mar 10 12:20:37.657409 sonic INFO systemd-udevd[32686]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 10 12:20:37.657703 sonic INFO systemd-udevd[32686]: Using default interface naming scheme 'v240'.
Mar 10 12:20:37.659151 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet4(ok) to state db
Mar 10 12:20:37.659151 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4.100 admin:0 oper:0 addr:6c:ec:5a:0a:52:cf ifindex:522 master:0 type:vlan
Mar 10 12:20:37.659487 sonic NOTICE swss#portsyncd: :- onMsg: Cannot find Ethernet4.100 in port table
Mar 10 12:20:37.662735 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4.100 admin:1 oper:0 addr:6c:ec:5a:0a:52:cf ifindex:522 master:0 type:vlan
Mar 10 12:20:37.662735 sonic NOTICE swss#portsyncd: :- onMsg: Cannot find Ethernet4.100 in port table
Mar 10 12:20:37.665590 sonic INFO kernel: [26947.815073] IPv6: ADDRCONF(NETDEV_UP): Ethernet4.100: link is not ready
Mar 10 12:20:37.666669 sonic DEBUG bgp#bgpcfgd: Received message : '('Ethernet4.100', 'SET', (('vrf', ''),))'
Mar 10 12:20:37.667354 sonic NOTICE swss#orchagent: :- addSubPort: Sub interface Ethernet4.100 inherits mtu size 9100 from parent port Ethernet4
Mar 10 12:20:37.668551 sonic ERR syncd#syncd: [0] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:319 Error processing rtr intf attribute failed with error -196604.
Mar 10 12:20:37.669187 sonic WARNING syncd#syncd: :- sai_serialize_enum: enum value -196604 not found in enum sai_status_t
Mar 10 12:20:37.669187 sonic ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: -196604
Mar 10 12:20:37.669187 sonic WARNING syncd#syncd: :- sai_serialize_enum: enum value -196604 not found in enum sai_status_t
Mar 10 12:20:37.669187 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID: oid:0x3000000000022
Mar 10 12:20:37.669187 sonic WARNING swss#orchagent: :- sai_deserialize_enum: enum -196604 not found in enum sai_status_t
Mar 10 12:20:37.669187 sonic WARNING swss#orchagent: :- sai_serialize_enum: enum value -196604 not found in enum sai_status_t
Mar 10 12:20:37.669187 sonic ERR swss#orchagent: :- create: create status: -196604
Mar 10 12:20:37.669187 sonic ERR swss#orchagent: :- addRouterIntfs: Failed to create router interface Ethernet4.100, rv:-196604
Mar 10 12:20:37.669466 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS: 6C:EC:5A:0A:52:CF
Mar 10 12:20:37.670683 sonic INFO swss#/supervisord: orchagent terminate called after throwing an instance of 'std::runtime_error'
Mar 10 12:20:37.670683 sonic INFO swss#/supervisord: orchagent   what():  Failed to create router interface.
Mar 10 12:20:37.672572 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_TYPE: SAI_ROUTER_INTERFACE_TYPE_SUB_PORT
Mar 10 12:20:37.672787 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_PORT_ID: oid:0x1000000000003
Mar 10 12:20:37.672929 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID: 100
Mar 10 12:20:37.673080 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE: true
Mar 10 12:20:37.673255 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE: true
Mar 10 12:20:37.673255 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_MTU: 9100
Mar 10 12:20:37.673255 sonic ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID: 0
Mar 10 12:20:37.677882 sonic DEBUG bgp#bgpcfgd: Received message : '('Ethernet4.100|1.1.1.1/30', 'SET', (('state', 'ok'),))'
Mar 10 12:20:37.935603 sonic INFO swss#supervisord 2021-03-10 12:20:37,934 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Mar 10 12:20:38.950760 sonic INFO swss#/supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...
Mar 10 12:20:38.951743 sonic INFO swss#supervisord 2021-03-10 12:20:38,951 WARN received SIGTERM indicating exit request
Mar 10 12:20:38.952524 sonic INFO swss#supervisord 2021-03-10 12:20:38,951 INFO waiting for supervisor-proc-exit-listener, rsyslogd, portsyncd, coppmgrd, arp_update, ndppd, neighsyncd, vlanmgrd, intfmgrd, portmgrd, buffermgrd, vrfmgrd, nbrmgrd, vxlanmgrd, fdbsyncd, tunnelmgrd to die
Mar 10 12:20:38.953299 sonic INFO swss#supervisord 2021-03-10 12:20:38,952 INFO stopped: tunnelmgrd (terminated by SIGTERM)
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS                        PORTS               NAMES
8edfb8ad6ca5        docker-syncd-brcm:latest             "/usr/local/bin/supe…"   45 minutes ago      Exited (0) 39 minutes ago                         syncd
99047bc4f892        docker-snmp:latest                   "/usr/local/bin/supe…"   3 days ago          Exited (137) 38 minutes ago                       snmp
82ff457452d9        docker-sflow:latest                  "/usr/local/bin/supe…"   8 days ago          Up 43 minutes                                     sflow
ab998f5f6b1a        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   3 weeks ago         Exited (0) 43 minutes ago                         telemetry
c17b1380bc5e        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   3 weeks ago         Up 8 hours                                        mgmt-framework
23762c2d148a        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   6 weeks ago         Exited (0) 39 minutes ago                         radv
077be1329468        docker-lldp:latest                   "/usr/bin/docker-lld…"   6 weeks ago         Up 45 minutes                                     lldp
47f6c7d9f25f        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   6 weeks ago         Exited (0) 39 minutes ago                         dhcp_relay
3d42a00795ce        docker-teamd:latest                  "/usr/local/bin/supe…"   6 weeks ago         Exited (0) 39 minutes ago                         teamd
f2101a2a58c1        docker-orchagent:latest              "/usr/bin/docker-ini…"   6 weeks ago         Exited (0) 39 minutes ago                         swss
07965401e53e        docker-fpm-frr:latest                "/usr/bin/docker_ini…"   6 weeks ago         Up 45 minutes                                     bgp
4fe933d331c5        docker-platform-monitor:latest       "/usr/bin/docker_ini…"   6 weeks ago         Exited (0) 43 minutes ago                         pmon
:~$ show subinterfaces status
  Sub port interface    Speed    MTU    Vlan    Admin                  Type
--------------------  -------  -----  ------  -------  --------------------
       Ethernet4.100     100G   9100     100       up  802.1q-encapsulation
wendani commented 3 years ago

Looks like error comes from brcm sai. What brcm sai version you are using?

TETA-Net commented 3 years ago
$ bcmcmd "bcmsai ver"
bcmsai ver
BRCM SAI ver: [4.2.1.5], OCP SAI ver: [1.7.1], SDK ver: [sdk-6.5.19]
drivshell>
wendani commented 3 years ago

There was sai spec change from SAI_ROUTER_INTERFACE_ATTR_VLAN_ID (enum value 3) to SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID (enum value 4) in creating a RIF type sub port https://github.com/opencomputeproject/SAI/pull/998

The implementation is not updated to the sai spec change

You can change loglevel on ROUTER_INTERFACE to get more info

sudo swssloglevel -l SAI_LOG_LEVEL_DEBUG -s -c ROUTER_INTERFACE
wendani commented 3 years ago
$ bcmcmd "bcmsai ver"
bcmsai ver
BRCM SAI ver: [4.2.1.5], OCP SAI ver: [1.7.1], SDK ver: [sdk-6.5.19]
drivshell>

Can you try on the latest 4.3.3.3 https://github.com/Azure/sonic-buildimage/pull/7090

TETA-Net commented 3 years ago
$ bcmcmd "bcmsai ver"
bcmsai ver
BRCM SAI ver: [4.2.1.5], OCP SAI ver: [1.7.1], SDK ver: [sdk-6.5.19]
drivshell>

Can you try on the latest 4.3.3.3 #7090

This problem was persist via new version. sonic version : SONiC.master.629-01b03307 build date : 5 Apr 2021 Platform: x86_64-ingrasys_s9100-r0

BRCM SAI ver : 4.3.3.4 OCP SAI ver 1.7.1 and SDK ver sdk-6.5.21

phiea commented 3 years ago

Any idea if this will be fixed any time soon? also tried with: BRCM SAI ver: [4.3.3.4], OCP SAI ver: [1.7.1], SDK ver: [sdk-6.5.21] CANCUN ver: [6.4.1] platform: Edgecore AS7326-56X same result

May 5 13:29:32.432460 sonic DEBUG bgp#bgpcfgd: Received message : '('Ethernet42.101', 'SET', (('vrf', ''),))' May 5 13:29:32.432460 sonic NOTICE swss#orchagent: :- addSubPort: Sub interface Ethernet42.101 inherits mtu size 9100 from parent port Ethernet42 May 5 13:29:32.432743 sonic NOTICE swss#orchagent: :- addRouterIntfs: Create router interface Ethernet42.101 MTU 9100 May 5 13:29:32.433569 sonic ERR syncd#syncd: [none] brcm_sai_create_router_interface:285 Error processing rtr intf attribute failed with error -196604. May 5 13:29:32.433569 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID: oid:0x300000000003a May 5 13:29:32.433632 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS: 64:9D:99:3A:3C:58 May 5 13:29:32.433644 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_TYPE: SAI_ROUTER_INTERFACE_TYPE_SUB_PORT May 5 13:29:32.433679 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_PORT_ID: oid:0x100000000002d May 5 13:29:32.433752 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID: 101 May 5 13:29:32.433821 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE: true May 5 13:29:32.433821 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE: true May 5 13:29:32.433821 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_MTU: 9100 May 5 13:29:32.433854 sonic ERR syncd#syncd: :- processEvent: attr: SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID: 0 May 5 13:29:32.433854 sonic WARNING syncd#syncd: :- sai_serialize_enum: enum value -196604 not found in enum sai_status_t May 5 13:29:32.433902 sonic ERR syncd#syncd: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x6000000000a02, status: -196604 May 5 13:29:32.433941 sonic ERR syncd#syncd: :- syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x6000000000a02, status: -196604 May 5 13:29:32.433941 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: sending switch_shutdown_request notification to OA May 5 13:29:32.434206 sonic NOTICE swss#orchagent: :- handle_switch_shutdown_request: switch shutdown request May 5 13:29:32.434206 sonic ERR swss#orchagent: :- on_switch_shutdown_request: Syncd stopped May 5 13:29:32.434426 sonic NOTICE syncd#syncd: :- notify_OA_about_syncd_exception: notification send successfull May 5 13:29:32.434829 sonic DEBUG bgp#bgpcfgd: Received message : '('Ethernet42.101|192.168.1.0/31', 'SET', (('state', 'ok'),))' May 5 13:29:32.436263 sonic INFO swss#supervisord: orchagent terminate called without an active exception May 5 13:29:32.603478 sonic INFO swss#supervisord 2021-05-05 13:29:32,602 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected) May 5 13:29:33.611984 sonic INFO swss#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...

i need to use it together with this functionality, attaching subinterface to vrf:

https://github.com/Azure/sonic-swss/pull/1521

phiea commented 3 years ago

just to keep info complete, tested with: BRCM SAI ver: [4.3.3.5], OCP SAI ver: [1.8.1], SDK ver: [sdk-6.5.21] CANCUN ver: [6.4.1] getting different error now, not able to reproduce the old one:


Jun  7 11:36:46.039767 sonic NOTICE swss#orchagent: :- addSubPort: Sub interface Ethernet42.101 inherits mtu size 9100 from parent port Ethernet42
Jun  7 11:36:46.040939 sonic NOTICE swss#orchagent: :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_KEEP to host interface: Ethernet42
Jun  7 11:36:46.042143 sonic INFO syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_config:6044 Creating vlan
Jun  7 11:36:46.042628 sonic ERR syncd#syncd: [none] SAI_API_VLAN:_brcm_sai_vlan_create_internal_vfi:4055 MC-GRP create failed with error Feature unavailable (0xfffffff0).
Jun  7 11:36:46.042628 sonic ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_config:6082 internal vfi create failed with error -2.
Jun  7 11:36:46.042628 sonic ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:575 sub port rif l2 config failed with error -2.
Jun  7 11:36:46.042628 sonic ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_NOT_SUPPORTED```

I used strings to determine that the errors are coming from libsai.so.1.0 in the libsaibcm_4.3.3.5-1_amd64 package, don't know what the error actually means or how it can be circumvented from outside the binairy yet.

Also basically all the containers restart, now i think this is not really a good way to handle these errors, a rollback would be much more elegant in my opinion, this basically crashes the switch.

Anyone with knowledge of the brcm sai lib or with a connection to broadcom to maybe help out with this?
prsunny commented 3 years ago

Looks like an sdk error - _brcm_sai_vlan_create_internal_vfi:4055 MC-GRP create failed with error Feature unavailable Is this TD3 platform?.

phiea commented 3 years ago

yes this is TD3

phiea commented 3 years ago

by the way, again for completeness, wendani pointed out that on the 202012 branch there is a slightly newer brcm sai version, 4.3.3.5-3 instaid of 4.3.3.5-1, from https://github.com/Azure/sonic-buildimage/pull/7728/files, which is also related to subinterfaces, but the newer lib stil gives the same issue.

JafarSeyedi commented 3 years ago

sub-interface still not working in BRCM SAI 5.0.0.6

abhiranjeet commented 2 years ago

Hi, I am still seeing same issue on AS7726-32X platform using 202111 branch. Is this resolved on any other branch ?

abhiranjeet commented 2 years ago

Hi, Is this thread active ? This issue is still visible on AS7712-32X (Tomahawk) & AS7726-32X (Trident 3) platform with 202111 branch.

JafarSeyedi commented 2 years ago

Hi

No, The problem resolved. It was because of Broadcom SAI which has been resolved.

Thanks

abhiranjeet commented 2 years ago

Hi,

Which Broadcom SAI version did you use ?

stenstad commented 1 year ago

I still have this problem on Tomahawk running 2022.11:

BRCM SAI ver: [7.1.111.1], OCP SAI ver: [1.11.0], SDK ver: [sdk-6.5.24]

rlebedys commented 7 months ago

The problem is still present in 202311. Attaching syslog of the issue right after issuing a config subinterface add Ethernet72.20 20.

Output of show version:

SONiC Software Version: SONiC.202311.480461-bacd21577
SONiC OS Version: 11
Distribution: Debian 11.8
Kernel: 5.10.0-23-2-amd64
Build commit: bacd21577
Build date: Sun Feb 18 12:27:37 UTC 2024
Built by: AzDevOps@vmss-soni0033YT

Platform: x86_64-accton_as7326_56x-r0
HwSKU: Accton-AS7326-56X
ASIC: broadcom
ASIC Count: 1

Broadcom SAI version:

:~# bcmcmd "bcmsai ver"
bcmsai ver
BRCM SAI ver: [10.1.6.0], OCP SAI ver: [1.13.2], SDK ver: [sdk-6.5.29], CANCUN ver: [06.04.01]
drivshell>

@JafarSeyedi can you check again if the problem is fixed in Broadcom SAI?

subinterface_add_logs.txt