sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
734 stars 1.41k forks source link

[chassis] monit container checker status fails on supervisor card with not all SFM's present #8520

Closed sanmalho-git closed 2 years ago

sanmalho-git commented 3 years ago

Description

On a supervisor card in a VoQ chassis, we create syncd/teamd/swss/lldp etc dockers for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.

The monit 'container_checker' fails in this scenario as it is expecting dockers for all Switch Fabrics (possibly based on NUM_ASIC defined in asic.conf file).

Steps to reproduce the issue:

  1. On a supervisor card in a VoQ chassis, issue the command 'sudo monit' status
  2. Check the syslog for err messages related to monit container_checker.

Describe the results you received:

admin@sonic:~$ sudo monit status
Monit 5.20.0 uptime: 1h 3m
.
.
Program 'container_checker'
  status                       Status failed
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  last exit value              3
  last output                  Expected containers not running: syncd10, swss5, teamd2, syncd13, lldp12, swss12, swss4, teamd9, syncd5, teamd3, syncd3, lldp3, syncd12, swss2, swss13, lldp9, syncd2, lldp11, teamd5, teamd11, teamd4, l
  data collected               Wed, 18 Aug 2021 14:57:06

Error messages seen in syslog look like:

Aug 18 14:58:07.025538 sonic ERR monit[691]: 'container_checker' status failed (3) -- Expected containers not running: teamd13, teamd2, swss2, syncd5, teamd9, teamd4, teamd8, lldp8, teamd12, syncd2, teamd5, lldp11, lldp13, teamd3, swss9, syncd4, syncd8, swss12, swss13, swss8, teamd11, lldp12, lldp4, swss5, lldp3, swss4, swss3, lldp9, syncd9, lldp10, syncd3, syncd11, teamd10, syncd13, lldp2, syncd12, lldp5, swss10, syncd10, swss11

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

rlhui commented 2 years ago

@anamehra - would you be able to help address this? Thanks.

sanmalho-git commented 2 years ago

Closing this as this should be addressed by https://github.com/Azure/sonic-buildimage/issues/10170