pulibrary / ops-catchall

Operations Catch All
0 stars 0 forks source link

CheckMK: lae-staging2 disk issues #110

Open acozine opened 2 hours ago

acozine commented 2 hours ago

We get regular alerts about disk usage on this machine. A quick look shows that the syslog is enormous and it's not getting rotated regularly:

pulsys@lae-staging2:~$ sudo du -h --max-depth=1 /var/log
4.1G    /var/log/journal
148K    /var/log/apt
72K /var/log/unattended-upgrades
4.0K    /var/log/dist-upgrade
1.1M    /var/log/installer
4.0K    /var/log/landscape
4.0K    /var/log/private
56K /var/log/redis
105M    /var/log/nginx
43G /var/log

and

pulsys@lae-staging2:/var/log$ ls -lah syslog*
-rw-r----- 1 syslog adm  20G Oct  3 22:00 syslog
-rw-r----- 1 syslog adm  18G Sep 29 00:00 syslog.1
-rw-r----- 1 syslog adm 507K Sep 22 00:00 syslog.2.gz
-rw-r----- 1 syslog adm 462K Sep 15 00:00 syslog.3.gz
-rw-r----- 1 syslog adm 456K Sep  8 00:00 syslog.4.gz

We should see if we can figure out why the syslog is so chatty, and also rotate the file more frequently.

acozine commented 2 hours ago

As a stopgap, I manually edited /etc/logrotate.d/rsyslog and changed the frequency from weekly to daily. But we should fix this in a way that will persist if we rebuild the machines. And also figure out what the root of the problem is.