Open dgsudharsan opened 3 years ago
@judyjoseph - could you please take a look, thanks.
@judyjoseph @anshuv-mfst Is there any update on this?
@dgsudharsan will check on this today to see if if I can repro and will update.
@dgsudharsan I tried this case, yes the error message is there, but this is during the processes are going down ( right as you mentioned team* daemons are cleaning up interfaces ). But after config reload when processes are up, I see port channel in good state - is that a similar behavior you find ?
@judyjoseph I don't think there is functionality impact as I had mentioned in the description. However, there are log analyzers in customer deployments which would get false alarm because of this error message. I believe the syslog should be clear of errors. In this case may i ask the reason for deleting the APP_DB entry while the actual issue is with clearing the netlink? I feel in clean up scenario(when task exits) the APP_DB deletion need not be performed. Please let me know your thoughts on this.
@dgsudharsan, I agree ideally the APP DB entry deletion will be triggered from NETLINK DEL message. But here when all the processes goes down, teamsyncd won't be waiting for the NETLINK messages to do cleanup ( we might miss events ) when teammgrd removes LAG here https://github.com/Azure/sonic-swss/blob/e29d566efb31378fbeac61f0b1a7dbd690d7e287/cfgmgr/teammgr.cpp#L492. We need to cleanup all LAG entries from APP_DB as well.
Description
When config reload is given with PortChannel part of a VLAN, below logs are seen in syslog Apr 14 01:24:14.912865 r-tigon-15 ERR swss#orchagent: :- removeLag: Failed to remove LAG PortChannel0002, it is still in VLAN
This is due to the fact that teammgrd and teamsyncd perform cleanups during config reload while other modules like VLAN don't leading to the check hit in orchagent. The below change was introduced to cleanup interfaces in kernel. However since it also performs APP_DB delete, orchagent handles it and since the references are not cleared it throws the error.
https://github.com/Azure/sonic-swss/pull/1159
Steps to reproduce the issue:
Describe the results you received:
Got the error syslog shown above
Describe the results you expected:
No error syslog should be thrown
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
This issue is easily reproducible with the steps mentioned and occurs every time. sonic_dump_r-tigon-15_20210414_011349.tar.gz