Open dgsudharsan opened 2 years ago
Scaled LAG: many LAGs on the system, each LAG has 1 member to maximize the number of LAGs, in this test, all ports are in LAGs (# of LAG == # of physical ports). Repro rate: 1 out of 10 retries.
@dgsudharsan please check if SNMP and TEAMD are both SWSS service dependents and shutdown by SWSS service script. If so, please try shutdown SNMP first as you suggested.
Jan 2 21:34:53.168231 r-panther-13 INFO systemd[1]: Stopping SNMP container…
Jan 2 21:34:57.379650 r-panther-13 INFO systemd[1]: Stopping TEAMD container…
Jan 2 21:35:03.621782 r-panther-13 ERR snmp#snmpd[21]: ioctl 35123 returned -1\n
Jan 2 21:35:04.254619 r-panther-13 INFO snmp#supervisord 2022-01-02 21:35:04,253 INFO stopped: snmpd (exit status 0)
Jan 2 21:35:06.385508 r-panther-13 INFO teamd#supervisord 2022-01-02 21:35:06,380 INFO stopped: teammgrd (exit status 0)
Jan 2 21:35:07.810544 r-panther-13 NOTICE root: Stopping swss service…
Jan 2 21:35:15.853449 r-panther-13 INFO swss#supervisord 2022-01-02 21:35:15,852 INFO stopped: orchagent (terminated by SIGTERM)
Hi @yxieca,
SNMPD & TEAMD are both SWSS dependent services and from the logs it is clear that stop jobs for teamd and snmp are running concurrently and once both are finished, stop job for swss is started. To start snmp before teamd, a explicit dependency on the teamd service has to be given for snmp. I can test doing this but the problem is, i couldn't repro this to actually verify if this fix solves the issue.
And even though this log is seen, snmpd has exited gracefully and given that this is a non-functional issue, the solution (assuming it works) seems overkill. Also, this solution might potentially increase the time for config reload.
Let me know what you think
Description
When a scaled LAG configuration is added and config reload is performed snmpd emits below log before shutting down. 'Jan 2 21:35:03.621782 r-panther-13 ERR snmp#snmpd[21]: ioctl 35123 returned -1\n'
I believe it might be due to some race condition where interface is deleted but snmpd tries to access it.
This issue occurs very rarely and hard to reproduce.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:sysdump_test_lags_scale.tar.gz
Additional information you deem important (e.g. issue happens only occasionally):