sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
173 stars 689 forks source link

orchagent be killed after DUT got numerous ipv6 routing table. #655

Open robert1030 opened 5 years ago

robert1030 commented 5 years ago

Description

  1. orchagent be killed after DUT got numerous ipv6 routing table.
  2. Debian 8 and Debian 9 both code can reproduce this issue.
  3. DUT is Delta-AG9032v1

Steps to reproduce the issue:

  1. "./testbed-cli.sh gen-mg vmst0-1 lab password.txt" produce t0 topology minigraph file.
  2. To deploy t0 minigraph file to DUT.
  3. To verify T0 - 4VMs has been started and working.
  4. Reboot DUT.
  5. login DUT and using command "pgrep orchagent -a" after rebooting completely.
  6. below is pgrep tracking situation Debian 8 - build619 root@sonic-ag9032:/home/admin# pgrep orchagent -a 3314 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# date Thu Jul 12 14:53:09 CST 2018 root@sonic-ag9032:/home/admin# pgrep orchagent -a 3314 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3314 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3314 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3314 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a root@sonic-ag9032:/home/admin# date Thu Jul 12 14:53:36 CST 2018

Debian9 - Build220 root@sonic-ag9032:/home/admin# pgrep orchagent -a 3153 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# date Thu Jul 12 15:41:32 CST 2018 root@sonic-ag9032:/home/admin# pgrep orchagent -a 3153 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3153 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3153 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3153 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a 3153 /usr/bin/orchagent -d /var/log/swss -b 8192 -m 00:18:23:30:e0:6e root@sonic-ag9032:/home/admin# pgrep orchagent -a root@sonic-ag9032:/home/admin# date Thu Jul 12 15:42:13 CST 2018

Describe the results you received: orchagent be killed after DUT port channel interfaces are link up and DUT got numerous ipv6 routing table.

Describe the results you expected: orchagent should not be killed or crash.

Additional information you deem important:

**Output of `show version`:**
Hardware DUT: Delta-AG9032v1

Debian 9 software version: root@sonic-ag9032:/home/admin# show version SONiC Software Version: SONiC.HEAD.220-61b5a5d Distribution: Debian 9.4 Kernel: 4.9.0-5-amd64 Build commit: 61b5a5d Build date: Mon May 28 12:08:52 UTC 2018 Built by: johnar@jenkins-worker-3 Docker images: REPOSITORY TAG IMAGE ID SIZE docker-syncd-brcm HEAD.220-61b5a5d bb940f4f200e 331.3 MB docker-syncd-brcm latest bb940f4f200e 331.3 MB docker-orchagent-brcm HEAD.220-61b5a5d 72f9a94b924f 252 MB docker-orchagent-brcm latest 72f9a94b924f 252 MB docker-lldp-sv2 HEAD.220-61b5a5d 574d38e03474 266 MB docker-lldp-sv2 latest 574d38e03474 266 MB docker-dhcp-relay HEAD.220-61b5a5d 822fea4bc0bc 248.6 MB docker-dhcp-relay latest 822fea4bc0bc 248.6 MB docker-database HEAD.220-61b5a5d 0f54ebb2da38 247.3 MB docker-database latest 0f54ebb2da38 247.3 MB docker-teamd HEAD.220-61b5a5d 46eb259c508f 251.8 MB docker-teamd latest 46eb259c508f 251.8 MB docker-snmp-sv2 HEAD.220-61b5a5d 24821957cd69 286.2 MB docker-snmp-sv2 latest 24821957cd69 286.2 MB docker-router-advertiser HEAD.220-61b5a5d 2dfcb5007dec 244.9 MB docker-router-advertiser latest 2dfcb5007dec 244.9 MB docker-platform-monitor HEAD.220-61b5a5d 4ce5a705dbe0 277 MB docker-platform-monitor latest 4ce5a705dbe0 277 MB docker-fpm-quagga HEAD.220-61b5a5d f549c12bd2e4 258.6 MB docker-fpm-quagga latest f549c12bd2e4 258.6 MB

Debian8 software version root@sonic-ag9032:/home/admin# show version SONiC Software Version: SONiC.HEAD.619-bbca583 Distribution: Debian 8.10 Kernel: 3.16.0-5-amd64 Build commit: bbca583 Build date: Thu Jun 21 18:02:52 UTC 2018 Built by: johnar@jenkins-worker-3 Docker images: REPOSITORY TAG IMAGE ID SIZE docker-syncd-brcm HEAD.619-bbca583 241d56dc47e8 336.5 MB docker-syncd-brcm latest 241d56dc47e8 336.5 MB docker-orchagent-brcm HEAD.619-bbca583 c906c6e6226e 257.3 MB docker-orchagent-brcm latest c906c6e6226e 257.3 MB docker-lldp-sv2 HEAD.619-bbca583 76b0711747ba 270.5 MB docker-lldp-sv2 latest 76b0711747ba 270.5 MB docker-dhcp-relay HEAD.619-bbca583 c075cb05f076 253.8 MB docker-dhcp-relay latest c075cb05f076 253.8 MB docker-database HEAD.619-bbca583 46219f00aded 252.5 MB docker-database latest 46219f00aded 252.5 MB docker-teamd HEAD.619-bbca583 0921dfd89658 257 MB docker-teamd latest 0921dfd89658 257 MB docker-snmp-sv2 HEAD.619-bbca583 cffcc246a173 291.4 MB docker-snmp-sv2 latest cffcc246a173 291.4 MB docker-router-advertiser HEAD.619-bbca583 95977131da6b 250.1 MB docker-router-advertiser latest 95977131da6b 250.1 MB docker-platform-monitor HEAD.619-bbca583 6f90dcadfe61 281.4 MB docker-platform-monitor latest 6f90dcadfe61 281.4 MB docker-fpm-quagga HEAD.619-bbca583 bd41eb5527ae 263.8 MB docker-fpm-quagga latest bd41eb5527ae 263.8 MB

**Attach debug file `sudo generate_dump`:**

T0 port channel link-up status root@sonic-ag9032:/home/admin# show interfaces portchannel Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected No. Team Dev Protocol Ports


0001 PortChannel0001 LACP(A)(Up) Ethernet112(S) 0002 PortChannel0002 LACP(A)(Up) Ethernet116(S) 0003 PortChannel0003 LACP(A)(Up) Ethernet120(S) 0004 PortChannel0004 LACP(A)(Up) Ethernet124(S)

DUT got ipv6 routing from 4 Arista VM root@sonic-ag9032:/home/admin# show ipv6 bgp sum BGP router identifier 10.1.0.32, local AS number 65100 RIB entries 12805, using 1401 KiB of memory Peers 8, using 36 KiB of memory Peer groups 2, using 112 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd fc00::72 4 64600 3262 64 0 0 0 00:02:55 6400 fc00::76 4 64600 3258 3263 0 0 0 00:02:53 6400 fc00::7a 4 64600 3259 3263 0 0 0 00:02:53 6400 fc00::7e 4 64600 3259 3938 0 0 0 00:02:53 6400

Docker swss logs for orchagent root@sonic-ag9032:/home/admin# docker logs swss | grep orchagent 2018-07-12 07:26:15,701 INFO spawned: 'orchagent' with pid 46 2018-07-12 07:26:16,710 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-07-12 07:32:43,606 INFO waiting for neighsyncd, intfsyncd, orchagent, buffermgrd, portsyncd, intfmgrd, vlanmgrd, rsyslogd to die 2018-07-12 07:32:45,628 INFO stopped: orchagent (terminated by SIGTERM) 2018-07-12 07:34:01,179 INFO spawned: 'orchagent' with pid 45 2018-07-12 07:34:02,188 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-07-12 07:36:16,911 INFO waiting for neighsyncd, intfsyncd, orchagent, buffermgrd, portsyncd, intfmgrd, vlanmgrd, rsyslogd to die 2018-07-12 07:36:18,932 INFO stopped: orchagent (terminated by SIGTERM) 2018-07-12 07:37:11,051 INFO spawned: 'orchagent' with pid 46 2018-07-12 07:37:12,058 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-07-12 07:37:56,088 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)

vincent201881 commented 5 years ago

Hi, From the your description, the based problem is your hardware not support so many IP routes, so you could modify the ansible-testbed eos's script(e.g file topo_t0.yml) to cut back the IP routes. But, anyways, the orchange crashed is not reasonable, that maybe need the system level to fixed it. E.g the hardware up-reported his route-table‘s capacity and up-software suppression delivery more routes entry and log/alert this kind event not just killed the orchagent.

robert1030 commented 5 years ago

Hi: sorry my reply is late, what parameter can be modify in topo_t0.yml to reduce ip routes, thanks