Open dgsudharsan opened 1 year ago
The fix added breaks the previously added workaround https://github.com/sonic-net/sonic-swss/pull/2626. Hence requesting to revert the fix. Once we find a proper solution for https://github.com/sonic-net/sonic-buildimage/issues/12361 we need to reintegrate https://github.com/sonic-net/sonic-swss/pull/2756
@srj102 pls help take a look and share your analysis
From the Techsupport added in #12361 it looks like VXLAN_EVPN_NVO was not configured leading to the OA not processing the VXLAN_REMOTE_VNI table APP DB entries.
Before the workaround for swss#2626, the case of EVPN_NVO coming later would have been handled via the following check.. " if (!tunnel_orch->getTunnelPort(remote_vtep,tunnelPort)) { SWSS_LOG_WARN("Vxlan tunnelPort doesn't exist: %s", remote_vtep.c_str()); return false; } "
However with the workaround we are seeing this issue.
@dgsudharsan can you please confirm this by removing the workaround made for swss#2626 ? It was agreed that this was a temporary workaround at that time for that specific branch.
From the Techsupport added in #12361 it looks like VXLAN_EVPN_NVO was not configured leading to the OA not processing the VXLAN_REMOTE_VNI table APP DB entries.
Before the workaround for swss#2626, the case of EVPN_NVO coming later would have been handled via the following check.. " if (!tunnel_orch->getTunnelPort(remote_vtep,tunnelPort)) { SWSS_LOG_WARN("Vxlan tunnelPort doesn't exist: %s", remote_vtep.c_str()); return false; } "
However with the workaround we are seeing this issue.
@dgsudharsan can you please confirm this by removing the workaround made for swss#2626 ? It was agreed that this was a temporary workaround at that time for that specific branch.
@srj102 I don't think removing that workaround alone helps. That work around is not present for p2mp orch. When evpn nvo is not present, we need to retry instead of returning success. My change https://github.com/sonic-net/sonic-swss/pull/2756 did that but it undid the swss#2626.
We have to find proper solution for https://github.com/sonic-net/sonic-buildimage/issues/12361 and we need to reintegrate https://github.com/sonic-net/sonic-swss/pull/2756
yes for p2mp case the changes made as part of 2756 will be required. p2p works without 2756 as well.
Since 2626 is a workaround with incomplete root causing. I believe it has to be removed from master. Changes made in 2756 is as expected and needs to be in the master and should not be reverted.
yes for p2mp case the changes made as part of 2756 will be required. p2p works without 2756 as well.
Since 2626 is a workaround with incomplete root causing. I believe it has to be removed from master. Changes made in 2756 is as expected and needs to be in the master and should not be reverted.
@prsunny What is your feedback here? Should we remove the workaround https://github.com/sonic-net/sonic-swss/pull/2626 and reintroduce https://github.com/sonic-net/sonic-swss/pull/2756 in master? Is anyone debugging the root cause of https://github.com/sonic-net/sonic-buildimage/issues/12361 ?
if we revert 2626, we will still have warmboot issue, right?
@srj102 Can you please provide ETA for fixing this?
Description
Sometime during config reload, EVPN NVO table arrives later than remote VNI table entries. In such scenarios, remote vni entries are ignored and this leads to traffic loss.
Steps to reproduce the issue:
Describe the results you received:
Remote entries are not added leading to traffic loss
Describe the results you expected:
No issues
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):