Open nazariig opened 8 months ago
@saiarcot895 please help take a look to see if this is an lldpd issue that needs to be fixed. This is seen on sonic 202305 with lldpd version much older than 1.0.16
I don't think this is related to the commit I made for Bookworm, upgrading lldp to 1.0.16. Based on these logs, the script that checks for the socket exited successfully:
Feb 26 19:43:56.003443 r-panther-42 INFO lldp#supervisord 2024-02-26 17:43:55,992 INFO spawned: 'waitfor_lldp_ready' with pid 26
Feb 26 19:43:56.005959 r-panther-42 INFO lldp#supervisord 2024-02-26 17:43:56,003 INFO success: waitfor_lldp_ready entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Feb 26 19:43:56.068109 r-panther-42 INFO lldp#supervisord: waitfor_lldp_ready 2024-02-26T17:43:56 [WARN/lldpctl] cannot find port eth0
Feb 26 19:43:56.069148 r-panther-42 INFO lldp#supervisord 2024-02-26 17:43:56,068 INFO exited: waitfor_lldp_ready (exit status 0; expected)
Instead, it looks as if either the lldp container somehow got started in a separate/private network namespace with none of the interfaces, or all of the network interfaces including the management interface eth0
were somehow not present in the container. The lldp container should be using the host's network namespace, and thus all of the interfaces outside of the container should be present within the container.
@saiarcot895 Do we know why lldp container is not running in the host's network namespace?
Not sure on that; it could be that the issue isn't related to network namespaces at all, but is something else entirely (maybe it got the index of the interface early in startup, then the interfaces went away and came back, resulting in new indices, and lldpd is still trying to get the interface with those old indices? This feels unlikely to me, though). Without a live repro though, this will be a bit difficult to debug further.
Description
The issue seems to be related to the socket file lock:
Although the ports do exist in the kernel, neither management interface nor data ports can't be configured until the config reload or switch reboot. This could be something related to socket existence check which is implemented in
waitfor_lldp_ready.sh
.LLDP supervisor config:
LLDP wait for script:
LLDP config:
Log
SWSS/SYNCD service is started:
Port init is done:
LLDP service is started:
LLDPD failed to find eth0 on init:
LLDPD failed to configure the rest of the ports:
Steps to reproduce the issue:
So far the issue was seen only once, no steps to reproduce or probability is too low
Describe the results you received:
Describe the results you expected:
No errors are expected
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):