sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

monit status taking ~14 min after boot to report disk space issue #14965

Open anamehra opened 1 year ago

anamehra commented 1 year ago

During a sonic-mgmt run on Chassis LC, our pre-sanity script reported 'monit status' check failure due to low disk soace. The recovery action was to reboot the card. Reboot may not be able to recover the system in such scenatio most of the time. Is it right thing to do?

After reboot, the pre-sanity check pass on 'monit status' though the disk usage was still high. After some debugging, I found that it took monit process ~14 mins to report disk issue. Is this expected behavior? May this be improved?

 20:44:44 up 14 min,  1 user,  load average: 3.38, 2.86, 2.31
Monit 5.20.0 uptime: 13m                    <<< monit does not report issue here

Filesystem 'root-overlay'
  status                       Accessible
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  permission                   755
  uid                          0
  gid                          0
 20:44:54 up 14 min,  1 user,  load average: 3.32, 2.86, 2.32
Monit 5.20.0 uptime: 14m    <<< monit reports issue here

Filesystem 'root-overlay'
  status                       Resource limit matched
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  permission                   755
  uid                          0
  gid                          0
 20:45:04 up 14 min,  1 user,  load average: 3.34, 2.88, 2.33
Monit 5.20.0 uptime: 14m

Description

Steps to reproduce the issue:

1. 2. 3.

Describe the results you received:

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

anamehra commented 1 year ago

@abdosi , FYI.

arlakshm commented 1 year ago

This may be expected because the monit may start later, to allow for the system init to complete. Can you check the same on a pizzabox and see if there is difference?

anamehra commented 1 year ago

On pizzabox as well I see that its takes ~5 mins after monit process up to recognize disk issue. If this is expected behavior, we can close this issue.