sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

[swss#orchagent] Failed to get port by bridge port ID 0x3a000000000b7b in modifying vlan #13069

Open yaqiangz opened 1 year ago

yaqiangz commented 1 year ago

Description

By frequently modifying vlan, an error appears in syslog: swss#orchagent: :- update: Failed to get port by bridge port ID 0x3a000000000b93.

Steps to reproduce the issue:

  1. Add vlan and vlan members
    config vlan add 223
    config interface ip add Vlan223 172.17.0.201/30
    config vlan member add -u 223 Ethernet24
    config vlan add 222
    config interface ip add Vlan222 172.17.0.182/26
    config vlan member add -u 222 Ethernet32
    config vlan member add -u 222 Ethernet33
    config vlan member add -u 222 Ethernet34
    config vlan member add -u 222 Ethernet35
    config vlan member add -u 222 Ethernet36
    config vlan member add -u 222 Ethernet37
    config vlan member add -u 222 Ethernet38
    config vlan member add -u 222 Ethernet39
    config vlan member add -u 222 Ethernet40
    config vlan member add -u 222 Ethernet41
    config vlan member add -u 222 Ethernet42
    config vlan member add -u 222 Ethernet43
    config vlan member add -u 222 Ethernet44
    config vlan member add -u 222 Ethernet45
    config vlan add 221
    config interface ip add Vlan221 172.17.0.126/26
    config vlan member add -u 221 Ethernet16
    config vlan member add -u 221 Ethernet17
    config vlan member add -u 221 Ethernet18
    config vlan member add -u 221 Ethernet19
    config vlan member add -u 221 Ethernet20
    config vlan member add -u 221 Ethernet21
    config vlan member add -u 221 Ethernet22
    config vlan member add -u 221 Ethernet23
    config vlan member add -u 221 Ethernet25
    config vlan member add -u 221 Ethernet26
    config vlan member add -u 221 Ethernet27
    config vlan member add -u 221 Ethernet28
    config vlan member add -u 221 Ethernet29
    config vlan member add -u 221 Ethernet30
    config vlan member add -u 221 Ethernet31
    config vlan add 220
    config interface ip add Vlan220 172.17.0.62/26
    config vlan member add -u 220 Ethernet0
    config vlan member add -u 220 Ethernet1
    config vlan member add -u 220 Ethernet2
    config vlan member add -u 220 Ethernet3
    config vlan member add -u 220 Ethernet4
    config vlan member add -u 220 Ethernet5
    config vlan member add -u 220 Ethernet6
    config vlan member add -u 220 Ethernet7
    config vlan member add -u 220 Ethernet8
    config vlan member add -u 220 Ethernet9
    config vlan member add -u 220 Ethernet10
    config vlan member add -u 220 Ethernet11
    config vlan member add -u 220 Ethernet12
    config vlan member add -u 220 Ethernet13
    config vlan member add -u 220 Ethernet14
    config vlan member add -u 220 Ethernet15
  2. Remove vlan members and vlan
    config interface ip remove Vlan223 172.17.0.201/30
    config vlan member del 223 Ethernet24
    config vlan del 223
    config interface ip remove Vlan222 172.17.0.182/26
    config vlan member del 222 Ethernet32
    config vlan member del 222 Ethernet33
    config vlan member del 222 Ethernet34
    config vlan member del 222 Ethernet35
    config vlan member del 222 Ethernet36
    config vlan member del 222 Ethernet37
    config vlan member del 222 Ethernet38
    config vlan member del 222 Ethernet39
    config vlan member del 222 Ethernet40
    config vlan member del 222 Ethernet41
    config vlan member del 222 Ethernet42
    config vlan member del 222 Ethernet43
    config vlan member del 222 Ethernet44
    config vlan member del 222 Ethernet45
    config vlan del 222
    config interface ip remove Vlan221 172.17.0.126/26
    config vlan member del 221 Ethernet16
    config vlan member del 221 Ethernet17
    config vlan member del 221 Ethernet18
    config vlan member del 221 Ethernet19
    config vlan member del 221 Ethernet20
    config vlan member del 221 Ethernet21
    config vlan member del 221 Ethernet22
    config vlan member del 221 Ethernet23
    config vlan member del 221 Ethernet25
    config vlan member del 221 Ethernet26
    config vlan member del 221 Ethernet27
    config vlan member del 221 Ethernet28
    config vlan member del 221 Ethernet29
    config vlan member del 221 Ethernet30
    config vlan member del 221 Ethernet31
    config vlan del 221
    config interface ip remove Vlan220 172.17.0.62/26
    config vlan member del 220 Ethernet0
    config vlan member del 220 Ethernet1
    config vlan member del 220 Ethernet2
    config vlan member del 220 Ethernet3
    config vlan member del 220 Ethernet4
    config vlan member del 220 Ethernet5
    config vlan member del 220 Ethernet6
    config vlan member del 220 Ethernet7
    config vlan member del 220 Ethernet8
    config vlan member del 220 Ethernet9
    config vlan member del 220 Ethernet10
    config vlan member del 220 Ethernet11
    config vlan member del 220 Ethernet12
    config vlan member del 220 Ethernet13
    config vlan member del 220 Ethernet14
    config vlan member del 220 Ethernet15
    config vlan del 220
  3. Repeat step 1-2 five more times

Describe the results you received:

Errors in syslog:

2022-12-15T10:48:27.1866702Z    Notice  swss#orchagent   :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3
2022-12-15T10:48:27.1868748Z    Notice  swss#orchagent   :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3
2022-12-15T10:48:27.2067912Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
2022-12-15T10:48:27.206964Z Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000722 sec
2022-12-15T10:48:27.2075571Z    Warning swss#orchagent   :- meta_sai_on_fdb_event_single: object key SAI_OBJECT_TYPE_FDB_ENTRY:{"bvid":"oid:0x26000000000986","mac":"10:70:FD:B6:13:0D","switch_id":"oid:0x21000000000000"} doesn't exist but received AGED event
2022-12-15T10:48:27.2081464Z    Notice  swss#orchagent   :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_STRIP to host interface: Ethernet13
2022-12-15T10:48:27.2083163Z    Notice  swss#orchagent   :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
2022-12-15T10:48:27.2084923Z    Notice  swss#orchagent   :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
2022-12-15T10:48:27.2261987Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
2022-12-15T10:48:27.2263966Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000671 sec
2022-12-15T10:48:27.2294837Z    Notice  swss#orchagent   :- removeBridgePort: Remove bridge port Ethernet13 from default 1Q bridge
2022-12-15T10:48:27.2296686Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.2299837Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.2301632Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.2303329Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.2304961Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.2306597Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.230846Z Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.2310296Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.2312017Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.2313752Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.2315411Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.2317089Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.2318715Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.2320311Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.2321941Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.2323582Z    Error   swss#orchagent   :- update: Failed to get port by bridge port ID 0x3a000000000e3f.
2022-12-15T10:48:27.2325253Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.2326917Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.2328572Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.23302Z  Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.2331851Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.2333571Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.2335291Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.2336951Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.2338571Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.2340458Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.555409Z Notice  swss#orchagent   :- removeVlanMember: Remove member Ethernet14 from VLAN Vlan220 lid:dc vmid:27000000000e42
2022-12-15T10:48:27.5570345Z    Notice  swss#orchagent   :- setPortPvid: Set pvid 1 to port: Ethernet14
2022-12-15T10:48:27.5572307Z    Notice  swss#orchagent   :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3
2022-12-15T10:48:27.5573982Z    Notice  swss#orchagent   :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3
2022-12-15T10:48:27.5764502Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
2022-12-15T10:48:27.5770629Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000652 sec
2022-12-15T10:48:27.5772298Z    Warning swss#orchagent   :- meta_sai_on_fdb_event_single: object key SAI_OBJECT_TYPE_FDB_ENTRY:{"bvid":"oid:0x26000000000986","mac":"10:70:FD:B6:13:0E","switch_id":"oid:0x21000000000000"} doesn't exist but received AGED event
2022-12-15T10:48:27.5785775Z    Notice  swss#orchagent   :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_STRIP to host interface: Ethernet14
2022-12-15T10:48:27.578767Z Notice  swss#orchagent   :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
2022-12-15T10:48:27.5789365Z    Notice  swss#orchagent   :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
2022-12-15T10:48:27.5975931Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
2022-12-15T10:48:27.5983623Z    Notice  swss#orchagent   :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000811 sec
2022-12-15T10:48:27.599032Z Notice  swss#orchagent   :- removeBridgePort: Remove bridge port Ethernet14 from default 1Q bridge
2022-12-15T10:48:27.5993291Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.5995609Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.5997386Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.5999025Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.6000624Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.6002401Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.6004152Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.6005789Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.6007464Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.6009086Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.6010693Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.6012523Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.6014357Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.6016019Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.6017631Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.6019277Z    Error   swss#orchagent   :- update: Failed to get port by bridge port ID 0x3a000000000e41.
2022-12-15T10:48:27.60209Z  Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.6022466Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.6024076Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.6025737Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.6027712Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.6029677Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.6031344Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.6033088Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.6034974Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.603663Z Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.8081945Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan1000 still has 1 FDB entries
2022-12-15T10:48:27.8084368Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan220 still has 16 FDB entries
2022-12-15T10:48:27.8086138Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan221 still has 15 FDB entries
2022-12-15T10:48:27.8087962Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan222 still has 14 FDB entries
2022-12-15T10:48:27.8089614Z    Notice  swss#orchagent   :- removeVlan: VLAN Vlan223 still has 1 FDB entries
2022-12-15T10:48:27.9401049Z    Notice  swss#orchagent   :- removeVlanMember: Remove member Ethernet15 from VLAN Vlan220 lid:dc vmid:27000000000e44
2022-12-15T10:48:27.9413833Z    Notice  swss#orchagent   :- setPortPvid: Set pvid 1 to port: Ethernet15

Describe the results you expected:

Not error logs in syslog during modifying vlan

Output of show version:

SONiC Software Version: SONiC.20220531.11
Distribution: Debian 11.5
Kernel: 5.10.0-18-2-amd64
Build commit: 4fea843b60
Build date: Fri Dec  2 17:29:10 UTC 2022
Built by: cloudtest@934c23b4c000005

Platform: x86_64-arista_720dt_48s
HwSKU: Arista-720DT-48S
ASIC: broadcom
ASIC Count: 1
Serial Number: WTW22180032
Model Number: CCS-720DT-48S
Hardware Revision: 02.00
Uptime: 10:55:47 up  6:31,  1 user,  load average: 0.38, 0.49, 0.70
Date: Thu 15 Dec 2022 10:55:47

Docker images:
REPOSITORY                 TAG           IMAGE ID       SIZE
docker-mux                 20220531.11   bbb54029348b   492MB
docker-mux                 latest        bbb54029348b   492MB
docker-macsec              latest        9a956a79621c   462MB
docker-acms                20220531.11   6b13a18f2b22   491MB
docker-acms                latest        6b13a18f2b22   491MB
docker-orchagent           20220531.11   37df6f245f67   478MB
docker-orchagent           latest        37df6f245f67   478MB
docker-fpm-frr             20220531.11   4b4728702235   489MB
docker-fpm-frr             latest        4b4728702235   489MB
docker-teamd               20220531.11   a99ebe49ae89   460MB
docker-teamd               latest        a99ebe49ae89   460MB
docker-syncd-brcm          20220531.11   3fab742757cf   786MB
docker-syncd-brcm          latest        3fab742757cf   786MB
docker-gbsyncd-broncos     20220531.11   4c4b06e5530b   491MB
docker-gbsyncd-broncos     latest        4c4b06e5530b   491MB
docker-gbsyncd-credo       20220531.11   0ba662472f06   461MB
docker-gbsyncd-credo       latest        0ba662472f06   461MB
docker-dhcp-relay          latest        47158046806c   456MB
docker-snmp                20220531.11   3c7288dbbd5c   488MB
docker-snmp                latest        3c7288dbbd5c   488MB
docker-sonic-telemetry     20220531.11   65cded006d13   524MB
docker-sonic-telemetry     latest        65cded006d13   524MB
docker-router-advertiser   20220531.11   1b69ea904eec   444MB
docker-router-advertiser   latest        1b69ea904eec   444MB
docker-platform-monitor    20220531.11   c298949ab11a   568MB
docker-platform-monitor    latest        c298949ab11a   568MB
docker-lldp                20220531.11   d87e11999f0f   486MB
docker-lldp                latest        d87e11999f0f   486MB
docker-database            20220531.11   cde7e421279d   444MB
docker-database            latest        cde7e421279d   444MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

gechiang commented 1 year ago

@adyeung please provide a fix proposal for this race condition. We need a generic solution for this as we see similar issues before. OA tries to get the BVID and due to race condition when FDB event comes it will hit this race condition.

anilkpan commented 1 year ago

FDB events are processed by fdborch and sairedis in different thread context, causing the FDB ref count to be out of sync sometimes. Updating sairedis from the same thread as fdborch is an option to fix the issue. @yaqiangz, can you please provide the techsupport so that I can confirm that it is the same issue?

yaqiangz commented 1 year ago

FDB events are processed by fdborch and sairedis in different thread context, causing the FDB ref count to be out of sync sometimes. Updating sairedis from the same thread as fdborch is an option to fix the issue. @yaqiangz, can you please provide the techsupport so that I can confirm that it is the same issue?

@anilkpan, it is not hard to reproduce the error by following steps I mentioned, could you please follow that to reproduce, maybe it is more helpful for trouble shooting.