Question: Considering automated translation from Nagios plugins output?

Napsty commented 1 month ago

I'm currently working on a project where monitoring is switching from a legacy system to a Prometheus-based monitoring. The new monitoring will use script_exporter to execute local scripts, created by the application owners. As most of these scripts are Nagios-plugin style scripts, we need to rewrite all of these scripts to change the output. What if script_exporter could automatically translate the output to a Prometheus format?

Idea: Add a parameter to enable nagios output translation

$ cat /etc/prometheus_script_exporter/scripts/anagiosplugin.yaml
  - name: anagiosplugin
    command: /usr/lib/nagios/plugins/anagiosplugin.sh -t something -p parameter
    nagios_translation: true
    timeout:
      max_timeout: 60

Not sure yet how the output could look like, e.g.

HELP anagiosplugin_exit Exit code of anagiosplugin. 0=OK, 1=Warning, 2=Critical
TYPE anagiosplugin_exit gauge
anagiosplugin_exit  2

We can come up and work together with more details, ideas etc. But this issue is mainly to ask you the question: Can something like this be considered from your side? Or do you say from the beginning, nope, won't do?

ricoberger commented 2 weeks ago

Hi, I'm open for such a feature, but I have to say that I'm not very familiar with Nagios.

If it is just about the exit code it should already be reflected in the script_exit_code metric.

Napsty commented 2 weeks ago

Cool! thanks for being open to that suggestion. It would be more about including the performance output of the Nagios plugins, potentially also some output translation.

A typical output from a Nagios plugin (here check_smart as an example) looks like this:

WARNING: Runtime_Bad_Block is non-zero (3), Uncorrectable_Error_Cnt is non-zero (1)|Reallocated_Sector_Ct=3 Power_On_Hours=31416 Power_Cycle_Count=889 Program_Fail_Count_Chip=2 Erase_Fail_Count_Chip=0 Wear_Leveling_Count=873 Used_Rsvd_Blk_Cnt_Chip=386 Used_Rsvd_Blk_Cnt_Tot=752 Unused_Rsvd_Blk_Cnt_Tot=3280 Program_Fail_Cnt_Total=3 Erase_Fail_Count_Total=0 Runtime_Bad_Block=3 Uncorrectable_Error_Cnt=1 ECC_Error_Rate=1 Offline_Uncorrectable=0 CRC_Error_Count=262 Available_Reservd_Space=1630 Total_LBAs_Written=3363924329 Total_LBAs_Read=3278685684

The exit code of this Nagios plugin is 1 (as the output shows WARNING).
The plugin's (informational) output can be found in the first part before the pipe character |.
Performance data is always behind the pipe character |.

The output for this plugin can be ignored, but there might be other plugins where an output translator could also be helpful.

The idea behind this feature request is to (more or less) support already existing Nagios plugins on systems, which otherwise would require a rewrite to output in Prometheus style. At least for the exit code and for the performance data (= metrics) this should be doable I believe.

PS I'll be at the Open Source Monitoring Conference in Nuremberg next week. If by chance you're there, too, we can discuss about this. :-)

ricoberger / script_exporter

Question: Considering automated translation from Nagios plugins output? #155