Open mstroecker opened 11 months ago
May I know if there any fix available for "Issue1:" ? Issue #1 : For the ICCPd below are the logs during the crash: It looks some of the ebtables updates are not supported.
Hi @selvatechtalk, unfortunately, we were not able to fix it and stopped testing SONiC. It's been a while though; maybe something has changed on the ICCPd front.
Praveen Elagala already provided some analysis in Google Groups: https://groups.google.com/g/sonicproject/c/00rnM19XgDs
Description
We encountered a problem regarding iccpd and mclag. We use two switches,
leafa
andleafb
(Model) in an L2-scenario. We followed the configuration-example on: https://support.edge-core.com/hc/en-us/articles/900002380706--Enterprise-SONiC-MC-LAGSince the official build does not include iccpd, we build an image of the
202305
branch with iccpd enabled. (202205
has the same problem)We are using
Portchannel01
as the peerlink with two Eternet-Interfaces and one mclag-instance on that peerlink. After that we added some Portchannels on both sides, we tested this configuration some weeks without a problem. But at some point the iccpd crashes and the mclag-pair was broken. We have to reboot the switches, but after some seconds running as expected the iccpd crashes again and leaves the mclag-pair in an running but broken state. We tried to debug this situation and saw that, if we only run one mclag-enabled switch (leafb
for example) the mclag is in error-state but we are able to see the known mac-addresses withmclagctl -i 1 dump macs
. Now we wanted to re-addleafa
. To circumvent any configuration-diffs in the PortChannels we removed all MCLAG-PortChannels fromleafa
(only mgmt-int and peerlink is configured) and applied the mclag related config:Right after the last command on
leafa
the iccpd crashes onleafb
. After rebooting both switches work as before.In the logs we found the following line on both switches:
We also found these lines in the near of the other ones:
As you can see the string Eth(e) seems to be cut off. Btw.: Currently we have only one single Ethernet-Uplink on
leafa
which is shared across the peerlink. We also tried removing it onleafa
and try to start the mclag-pair without any luck. iccpd crashes with the same error/behavior.To be clear, we had this problem first when both switches had the full MCLAG-PortChannel setup. We created tech-support-files on both switches right after the crash and before we reboot them.
Steps to reproduce the issue:
Describe the results you received:
It seems to be okay for a few seconds after that: (Core Dumps for iccpd and orchagent are available)
Describe the results you expected:
A working mclag state.
Output of
show version
:Output of
show techsupport
:https://crossmediasolutions-my.sharepoint.com/:f:/g/personal/m_stroecker_4allportal_com/EtcT8kAQtZxDpGLIv1bSCJkB2_VPhsv-3yOoy2li3XOxug?e=gi8kbF
Additional information you deem important (e.g. issue happens only occasionally):
I built the images with symbols and generated the requested stack traces:
ICCPD:
Orchagent: