sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

Leftover portchannel found in the kernel after switch topology from T0 to T1-LAG #2760

Open keboliu opened 5 years ago

keboliu commented 5 years ago

Description After switch DUT topology from T0 to T1-LAG through "config reload", some portchannel in T0 topology still can be found in the kernel(from the result of "ifconfig"), only reboot the DUT can have these leftover portchannels cleared.

Switch from T1-LAG to T0 can see the similiar issue.

capture from the DUT:

  1. configured portchannels of T1 topo:

    root@arc-mtbc-1001:/tmp# show interface portchannel 
    Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected
    No.  Team Dev         Protocol     Ports
    -----  ---------------  -----------  ---------------------------
    0002  PortChannel0002  LACP(A)(Up)  Ethernet0(S) Ethernet4(S)
    0005  PortChannel0005  LACP(A)(Up)  Ethernet8(S) Ethernet12(S)
    0008  PortChannel0008  LACP(A)(Up)  Ethernet20(S) Ethernet16(S)
    0011  PortChannel0011  LACP(A)(Up)  Ethernet28(S) Ethernet24(S)
    0014  PortChannel0014  LACP(A)(Up)  Ethernet32(S) Ethernet36(S)
    0017  PortChannel0017  LACP(A)(Up)  Ethernet44(S) Ethernet40(S)
    0020  PortChannel0020  LACP(A)(Up)  Ethernet52(S) Ethernet48(S)
    0023  PortChannel0023  LACP(A)(Up)  Ethernet60(S) Ethernet56(S)
  2. All portchannels from "ifconfig" command, including leftovers:

root@arc-mtbc-1001:/tmp# ifconfig | grep PortChannel
PortChannel0001: flags=4099<UP,BROADCAST,MULTICAST>  mtu 9100
PortChannel0002: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0003: flags=4099<UP,BROADCAST,MULTICAST>  mtu 9100
PortChannel0004: flags=4099<UP,BROADCAST,MULTICAST>  mtu 9100
PortChannel0005: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0008: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0011: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0014: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0017: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0020: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0023: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100

Steps to reproduce the issue:

  1. Deploy the DUT with T0 topo
  2. Prepare and config_db.json for T1-LAG topo and replace the current one.
  3. Issue "config reload" to change the configuration, after success can still see the T0 portchannels in the kernel

Describe the results you received: Not all the portchannels of T0 are cleared.

Describe the results you expected: All the T0 configuration should be cleared and only T1-LAG configuration applied on the DUT

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

```
root@arc-mtbc-1001:/tmp# show version
SONiC Software Version: SONiC.HEAD.43-6c1a0ce
Distribution: Debian 9.8
Kernel: 4.9.0-8-amd64
Build commit: 6c1a0ce
Build date: Tue Apr  9 10:44:13 UTC 2019
Built by: johnar@jenkins-worker-4

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-orchagent-mlnx      HEAD.43-6c1a0ce     1584c6f5403b        286MB
docker-orchagent-mlnx      latest              1584c6f5403b        286MB
docker-syncd-mlnx          HEAD.43-6c1a0ce     8504950d2a3c        331MB
docker-syncd-mlnx          latest              8504950d2a3c        331MB
docker-lldp-sv2            HEAD.43-6c1a0ce     689ccc124209        274MB
docker-lldp-sv2            latest              689ccc124209        274MB
docker-dhcp-relay          HEAD.43-6c1a0ce     9db41f909a46        256MB
docker-dhcp-relay          latest              9db41f909a46        256MB
docker-database            HEAD.43-6c1a0ce     5f01e8163e4a        255MB
docker-database            latest              5f01e8163e4a        255MB
docker-snmp-sv2            HEAD.43-6c1a0ce     4e99da8c3edb        294MB
docker-snmp-sv2            latest              4e99da8c3edb        294MB
docker-teamd               HEAD.43-6c1a0ce     5f2220585a30        274MB
docker-teamd               latest              5f2220585a30        274MB
docker-router-advertiser   HEAD.43-6c1a0ce     477c29d00dc8        254MB
docker-router-advertiser   latest              477c29d00dc8        254MB
docker-platform-monitor    HEAD.43-6c1a0ce     0c4056ed0b1d        286MB
docker-platform-monitor    latest              0c4056ed0b1d        286MB
docker-fpm-quagga          HEAD.43-6c1a0ce     51364e3fc200        281MB
docker-fpm-quagga          latest              51364e3fc200        281MB
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
xinliu-seattle commented 5 years ago

@keboliu Can you help with the fix?

madhukar-kamarapu commented 5 years ago

When a port-channel is created by user (config portchannel add PortChannelXXX), the corresponding netdevice is created in the kernel(teamd_init() calls team_create() which creates the netdevice in the kernel).

These netdevices are deleted in the kernel when user deletes the configuration (config portchannel del PortChannelXXX).

During config-reload, port-channel netdevices are not exclusively deleted in the kernel.

Solution - when teamd docker starts, delete all the existing port-channel netdevices in kernel.

prsunny commented 4 years ago

Looks like fixed as part of https://github.com/Azure/sonic-swss/pull/1159