sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.43k forks source link

Interface is not coming up in the kernel after reboot (AS7326) #20087

Open tomvil opened 3 months ago

tomvil commented 3 months ago

Description

After a switch reboot, the interface appears as UP in the show interface status output but is displayed as DOWN in the kernel (ip link show Ethernet0). To resolve this issue, it is necessary to manually bring the interface up by running ip link set up Ethernet0 or alternatively config interface startup Ethernet0.

This behavior has been observed on the EdgeCore AS7326 switch. The same image was tested on the DELL S5248F, where the issue did not occur.

# show interfaces status | grep Ethernet0
  Ethernet0                3      25G   9100    N/A    Eth1/1(Port1)  routed      up       up                            SFP/SFP+/SFP28         N/A

# ip link show Ethernet0 | grep state
9: Ethernet0: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000

# ip link set up Ethernet0
# ip link show Ethernet0 | grep state
9: Ethernet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000

Steps to reproduce the issue:

  1. Build image from 202405 branch
  2. Install it on AS7326 switch
  3. After boot the Ethernet0 (this interface in my case is connected to a server) will be shown as DOWN in the kernel

Describe the results you received:

After a reboot, all interfaces that should be up remain down and require manual intervention to bring them up.

Describe the results you expected:

After reboot all interfaces that should be up should automatically transition to the UP state without manual intervention.

Output of show version:

SONiC Software Version: SONiC.202405.0-dirty-20240830.091822
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-11-2-amd64
Build commit: 249c20bdf
Build date: Fri Aug 30 06:58:39 UTC 2024

Platform: x86_64-accton_as7326_56x-r0
HwSKU: Accton-AS7326-56X
ASIC: broadcom
ASIC Count: 1
Hardware Revision: N/A
Uptime: 14:27:01 up 12 min,  1 user,  load average: 1.34, 1.16, 0.80
Date: Fri 30 Aug 2024 14:27:01

Docker images:
REPOSITORY                    TAG                              IMAGE ID       SIZE
docker-dhcp-relay             latest                           934cdc88b25f   324MB
docker-dhcp-server            latest                           cdf709f8a11d   338MB
docker-fpm-frr                202405.0-dirty-20240830.091822   1b46e9d04a15   375MB
docker-fpm-frr                latest                           1b46e9d04a15   375MB
docker-macsec                 latest                           3a77d124e235   346MB
docker-lldp                   202405.0-dirty-20240830.091822   7904ddbfb954   360MB
docker-lldp                   latest                           7904ddbfb954   360MB
docker-mux                    202405.0-dirty-20240830.091822   f355662bc7b3   366MB
docker-mux                    latest                           f355662bc7b3   366MB
docker-snmp                   202405.0-dirty-20240830.091822   81a0b637c93e   354MB
docker-snmp                   latest                           81a0b637c93e   354MB
docker-sonic-gnmi             202405.0-dirty-20240830.091822   e4a31bbc8cd4   399MB
docker-sonic-gnmi             latest                           e4a31bbc8cd4   399MB
docker-sonic-mgmt-framework   202405.0-dirty-20240830.091822   6d5e68ff3033   401MB
docker-sonic-mgmt-framework   latest                           6d5e68ff3033   401MB
docker-teamd                  202405.0-dirty-20240830.091822   3f52352e3264   343MB
docker-teamd                  latest                           3f52352e3264   343MB
docker-platform-monitor       202405.0-dirty-20240830.091822   c3c08f5f6d41   440MB
docker-platform-monitor       latest                           c3c08f5f6d41   440MB
docker-sflow                  202405.0-dirty-20240830.091822   9da2da5cca1c   344MB
docker-sflow                  latest                           9da2da5cca1c   344MB
docker-router-advertiser      202405.0-dirty-20240830.091822   c932382d33d1   315MB
docker-router-advertiser      latest                           c932382d33d1   315MB
docker-orchagent              202405.0-dirty-20240830.091822   31fa919519aa   356MB
docker-orchagent              latest                           31fa919519aa   356MB
docker-nat                    202405.0-dirty-20240830.091822   85a7be8ce26d   346MB
docker-nat                    latest                           85a7be8ce26d   346MB
docker-iccpd                  202405.0-dirty-20240830.091822   d44f59428033   344MB
docker-iccpd                  latest                           d44f59428033   344MB
docker-database               202405.0-dirty-20240830.091822   59cefa77b041   323MB
docker-database               latest                           59cefa77b041   323MB
docker-eventd                 202405.0-dirty-20240830.091822   bbe4d9b78786   314MB
docker-eventd                 latest                           bbe4d9b78786   314MB
docker-syncd-brcm             202405.0-dirty-20240830.091822   3f34d16e8e42   717MB
docker-syncd-brcm             latest                           3f34d16e8e42   717MB
docker-gbsyncd-broncos        202405.0-dirty-20240830.091822   6ac692db5646   354MB
docker-gbsyncd-broncos        latest                           6ac692db5646   354MB
docker-gbsyncd-credo          202405.0-dirty-20240830.091822   3f63e3eb401e   327MB
docker-gbsyncd-credo          latest                           3f63e3eb401e   327MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

tomvil commented 2 months ago

I've built a clean (without any non-default features enabled) image and can not reproduce this issue. Will try to investigate further, which feature is causing this.

zhangyanzhao commented 2 months ago

Platform issue and need platform group to check.

tomvil commented 2 months ago

So, this is reproducible with "ZTP" feature enabled. If you use dhcp option 239 and execute a simple script like

#!/bin/bash

/usr/bin/logger "success"

this will cause Ethernet0 to be displayed as "DOWN" in the kernel. Probably because default /etc/sonic/config_db.json is not generated? As soon I as copy-paste default config, reboot the switch (or config reload -y) and everything is ok. Interface is UP, and stays up after reboot, etc.

Not sure if this is an expected behavior?