Show service silent failures on main system info page.

virtualmin / virtualmin-gpl

Virtualmin web hosting control panel for Webmin

https://www.virtualmin.com

GNU General Public License v3.0

312 stars 97 forks source link

Show service silent failures on main system info page. #22

Open chris001 opened 6 years ago

chris001 commented 6 years ago

Showing the service up/down status is good, but it's not good enough. The virtualmin system info page should also alert you of errors causing critical services to silently fail. The perfect example I'm now facing is a silent failure of dovecot imap server. All users fail to connect over IMAP and consequently all users are unable to read all mailboxes!!

All client imap mail apps fail to connect.
The Dovecot service is running!!
Virualmin doesn't show any indication of warning or error, just the "running" icon !!!
The warning and error info is readily available - BUT ONLY if you happen to be a Dovecot guru and proactively go and look for errors in the 15 different Dovecot log files.
Virtualmin should just try to scan dovecot logs to detect these severe fail conditions and show a warning icon with a shortcut to go view the Dovecot logs.
It's not enough to show a service is up and running. It's just as important to verify it's healthy and operating cleanly and not in limp mode throwing major errors.

jcameron commented 6 years ago

This is a nice feature idea, but honestly not one we are likely to get to any time soon - in fact, scanning logs for to detect serious errors could be an entire product in itself :-) (for example, Splunk) The reason it's hard is that logs of most servers are full of errors due to bad logins or invalid email destinations, so separating those out from the real unrecoverable errors requires a detailed knowledge of each server's behavior and log format (which likely changes with every release).

chris001 commented 6 years ago

Probably a one line shell command, PER SERVICE (apache, nginx, postifx, proftpd, mysql, postgresql, etc), would get the most glaring fatal errors, which flat out prevent Dovecot from loading and serving clients whatsoever. Something like: tail -100 /var/log/mail.err | grep Fatal

Oct 30 11:34:41 server1 dovecot: imap-login: Fatal: Can't set cipher list to 'ECDHE-RSA-AES256-SHA384:AES256-SHA256:AES256-SHA256:RC4:HIGH:MEDIUM:+TLSv1:+TLSv1_1:+TLSv1_2:!LOW:!MD5:!SSLv2:!SSLv3:!ADH:!aNULL:!eNULL:!NULL:!DH:!ADH:!EDH:!AESGCM': error:140E6118:SSL routines:SSL_CIPHER_PROCESS_RULESTR:invalid command
Oct 30 11:43:18 server1 dovecot: doveadm: Fatal: This is Dovecot's fatal log (1509378197)
Oct 30 12:05:07 server1 dovecot: auth: Fatal: Unknown userdb driver 'pam'
Oct 30 12:08:13 server1 dovecot: auth: Fatal: CRAM-MD5 mechanism can't be supported with given passdbs
Oct 30 12:09:09 server1 dovecot: auth: Fatal: DIGEST-MD5 mechanism can't be supported with given passdbs

jcameron commented 6 years ago

I'd have to do some more investigation to see if this method is reliable enough to cover all fatal errors..