lausser / check_logfiles

A plugin (monitoring-plugin, not nagios-plugin, see also http://is.gd/PP1330) which scans logfiles for patterns.
https://omd.consol.de/docs/plugins/check_logfiles/
GNU General Public License v2.0
46 stars 27 forks source link

check_logfiles on /var/log/message resulting in frequent socket timeouts #59

Open ChristopherP1221 opened 4 years ago

ChristopherP1221 commented 4 years ago

Hello,

I'm looking for some guidance as this issue has been plaguing me for a little while now and I'm almost positive it's related to something I'm doing inefficiently.

I am using the "check_logfiles" plugin against my syslog located at /var/log/messages. I wanted the granularity of defining different properties and thresholds for different patterns so I am choosing to use different .cfg patterns and different nagios service checks. I have been receiving many socket timeouts from these service checks. They are not constant and happen on different hosts but it occurs all day long intermittently on different servers

It should be noted, there are also unrelated checks that are not exhibiting the same "socket timeout" behavior.

Here are the config files in question:

check_logfiles_messages_qla_critical.cfg @searches = ( { tag => 'critical qla', logfile => '/var/log/messages', criticalpatterns => 'Abort command issued nexus', options => "criticalthreshold=15", }, );

check_logfiles_messages_qla_warning.cfg @searches = ( { tag => 'warning qla', logfile => '/var/log/messages', warningpatterns => ['QUEUE FULL detected', 'FCPort state transitioned from'], options => "warningthreshold=8", }, );

Other examples that seem to run just fine (no intermittent socket timeouts)... @searches = ( { tag => 'lpfc', logfile => '/var/log/messages', criticalpatterns => 'kernel: lpfc', }, );

Below is how the nagios command is being issued, sudoers has already been configured, I recently added the --rununique flag to see if that would help, it hasn't. Any help/guidance/insight into what this plugin is doing that I might be overlooking would be extremely helpful! For example, I know that a temporary index file gets created, is it possible that several of these index files are being created and conflicting with each other or somehow confusing the script?

/usr/bin/sudo /usr/lib64/nagios/plugins/check_logfiles --rununique -f /etc/nagios/plugins/check_logfiles_messages_qla_critical.cfg

lausser commented 4 years ago

Socket timeout might be a dns problem, which is out of the scope of this plugin.

As I don’t see the error message, I can only speculate.

Check_logfiles finds out the hostname of the server it is running on. And this is the only place where I can imagine sockets to be involved. You might consider running the nscd.

Von: ChristopherP1221 [mailto:notifications@github.com] Gesendet: Montag, 19. Oktober 2020 21:02 An: lausser/check_logfiles check_logfiles@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Betreff: [lausser/check_logfiles] check_logfiles on /var/log/message resulting in frequent socket timeouts (#59)

Hello,

I'm looking for some guidance as this issue has been plaguing me for a little while now and I'm almost positive it's related to something I'm doing inefficiently.

I am using the "check_logfiles" plugin against my syslog located at /var/log/messages. I wanted the granularity of defining different properties and thresholds for different patterns so I am choosing to use different .cfg patterns and different nagios service checks. I have been receiving many socket timeouts from these service checks. They are not constant and happen on different hosts but it occurs all day long intermittently on different servers

It should be noted, there are also unrelated checks that are not exhibiting the same "socket timeout" behavior.

Here are the config files in question:

check_logfiles_messages_qla_critical.cfg @searches https://github.com/searches = ( { tag => 'critical qla', logfile => '/var/log/messages', criticalpatterns => 'Abort command issued nexus', options => "criticalthreshold=15", }, );

check_logfiles_messages_qla_warning.cfg @searches https://github.com/searches = ( { tag => 'warning qla', logfile => '/var/log/messages', warningpatterns => ['QUEUE FULL detected', 'FCPort state transitioned from'], options => "warningthreshold=8", }, );

Other examples that seem to run just fine (no intermittent socket timeouts)... @searches https://github.com/searches = ( { tag => 'lpfc', logfile => '/var/log/messages', criticalpatterns => 'kernel: lpfc', }, );

Below is how the nagios command is being issued, sudoers has already been configured, I recently added the --rununique flag to see if that would help, it hasn't. Any help/guidance/insight into what this plugin is doing that I might be overlooking would be extremely helpful! For example, I know that a temporary index file gets created, is it possible that several of these index files are being created and conflicting with each other or somehow confusing the script?

/usr/bin/sudo /usr/lib64/nagios/plugins/check_logfiles --rununique -f /etc/nagios/plugins/check_logfiles_messages_qla_critical.cfg

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lausser/check_logfiles/issues/59 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AABQSOC2KX2BSQKYVHYCK4LSLSEK3ANCNFSM4SWRIBFQ .