sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
711 stars 1.36k forks source link

[ZTP] management interface link flaps due to ZTP periodically start network discovery #11126

Closed Junchao-Mellanox closed 2 years ago

Junchao-Mellanox commented 2 years ago

Description

management interface link flaps due to ZTP periodically start network discovery. The issue is always reproduced.

Steps to reproduce the issue:

  1. Build SONiC with ZTP enabled
  2. Install SONiC via ONIE
  3. ZTP automatically runs4.
  4. ZTP start network discovery periodically, which calls "systemctl restart interfaces-config". Code here: https://github.com/Azure/sonic-ztp/blob/f7dd3c54ec57848f7cb6d3eec748a4d8e54d0e6c/src/usr/lib/ztp/ztp-engine.py#L856
  5. interfaces-config service take eth0 down by command "ifdown --force eth0" 6. Code here: https://github.com/Azure/sonic-buildimage/blob/6a4105ad17de526383efd8982f55a7e7129e46b1/files/image_config/interfaces/interfaces-config.sh#L28

Output of service interfaces-config status:

Jun 14 02:23:29 r-panther-23 systemd[1]: Starting Update interfaces configuration...
Jun 14 02:23:30 r-panther-23 dhclient[5593]: Killed old client process
Jun 14 02:23:30 r-panther-23 interfaces-config.sh[5593]: Killed old client process
Jun 14 02:23:31 r-panther-23 dhclient[5594]: Killed old client process
Jun 14 02:23:31 r-panther-23 interfaces-config.sh[5594]: Killed old client process
Jun 14 02:23:32 r-panther-23 dhclient[5594]: DHCPRELEASE of 10.210.25.147 on eth0 to 10.211.0.124 port 67
Jun 14 02:23:33 r-panther-23 interfaces-config.sh[5649]: net.ipv6.conf.eth0.accept_ra_defrtr = 1
Jun 14 02:23:33 r-panther-23 interfaces-config.sh[5649]: net.ipv6.conf.eth0.accept_ra = 1
Jun 14 02:23:39 r-panther-23 systemd[1]: Finished Update interfaces configuration.

Describe the results you received:

management interface link goes down each time calling "ifdown --force eth0"

Describe the results you expected:

ZTP shall not restart interfaces-config service unless there is a ZTP profile/configuration change

Output of show version:

This issue is observed on latest 202012/202111, did not try other version, but I suppose it is also on master.

SONiC Software Version: SONiC.202012.310-49a2d8558_Internal
Distribution: Debian 10.12
Kernel: 4.19.0-12-2-amd64
Build commit: 49a2d8558
Build date: Mon Jun 13 09:20:42 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic02-005

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1849X08806
Uptime: 02:38:22 up 17 min,  1 user,  load average: 1.05, 1.71, 1.53

Docker images:
REPOSITORY                    TAG                             IMAGE ID            SIZE
docker-syncd-mlnx             202012.310-49a2d8558_Internal   c0dded81a071        858MB
docker-syncd-mlnx             latest                          c0dded81a071        858MB
docker-platform-monitor       202012.310-49a2d8558_Internal   579c3b1e8525        670MB
docker-platform-monitor       latest                          579c3b1e8525        670MB
docker-teamd                  202012.310-49a2d8558_Internal   ae598f7109e8        373MB
docker-teamd                  latest                          ae598f7109e8        373MB
docker-nat                    202012.310-49a2d8558_Internal   6d494116b65c        376MB
docker-nat                    latest                          6d494116b65c        376MB
docker-router-advertiser      202012.310-49a2d8558_Internal   f7fe1709fc02        362MB
docker-router-advertiser      latest                          f7fe1709fc02        362MB
docker-snmp                   202012.310-49a2d8558_Internal   859ea06ffba3        405MB
docker-snmp                   latest                          859ea06ffba3        405MB
docker-database               202012.310-49a2d8558_Internal   02c7d2f30544        362MB
docker-database               latest                          02c7d2f30544        362MB
docker-lldp                   202012.310-49a2d8558_Internal   9a324bd511e8        402MB
docker-lldp                   latest                          9a324bd511e8        402MB
docker-orchagent              202012.310-49a2d8558_Internal   3a66a0bf2bf2        390MB
docker-orchagent              latest                          3a66a0bf2bf2        390MB
docker-sonic-telemetry        202012.310-49a2d8558_Internal   8e9e6c95464c        451MB
docker-sonic-telemetry        latest                          8e9e6c95464c        451MB
docker-dhcp-relay             202012.310-49a2d8558_Internal   9d9f3cffb8de        375MB
docker-dhcp-relay             latest                          9d9f3cffb8de        375MB
docker-sonic-mgmt-framework   202012.310-49a2d8558_Internal   402926a5fd94        687MB
docker-sonic-mgmt-framework   latest                          402926a5fd94        687MB
docker-mux                    202012.310-49a2d8558_Internal   61dbc171a03c        414MB
docker-mux                    latest                          61dbc171a03c        414MB
docker-fpm-frr                202012.310-49a2d8558_Internal   118f8e620df6        392MB
docker-fpm-frr                latest                          118f8e620df6        392MB
docker-sflow                  202012.310-49a2d8558_Internal   bc2eb64f668e        374MB
docker-sflow                  latest                          bc2eb64f668e        374MB

Output of show techsupport:

sonic_dump_r-panther-23_20220614_023847.tar.gz

rajendra-dendukuri commented 2 years ago

The basic principle of ZTP is that user's intervention is not required. The ZTP service tries to establish connectivity repeatedly and looks for provisioning information. If due to network error or misconfiguration, the provisioning information is not made available, the ZTP serviceretries to discover it periodically (every 300s). That is where the existing DHCP leases are released and taken afresh. Alternative is for the user to intervene which is contrary to the basic principle. The connectivity is not just lost but re-established. The users are not expected to login during a ZTP session and perform parallel provisioning. But, if the user wishes to operate on the switch, the ZTP service can be stopped using the "systemctl stop ztp" command or disabled using the "ztp disable" command.

Junchao-Mellanox commented 2 years ago

Close as this is expected behavior.