ricoberger / script_exporter

Prometheus exporter to execute scripts and collect metrics from the output or the exit status.
MIT License
354 stars 82 forks source link

Question: Considering automated translation from Nagios plugins output? #155

Open Napsty opened 1 month ago

Napsty commented 1 month ago

I'm currently working on a project where monitoring is switching from a legacy system to a Prometheus-based monitoring. The new monitoring will use script_exporter to execute local scripts, created by the application owners. As most of these scripts are Nagios-plugin style scripts, we need to rewrite all of these scripts to change the output. What if script_exporter could automatically translate the output to a Prometheus format?

Idea: Add a parameter to enable nagios output translation

$ cat /etc/prometheus_script_exporter/scripts/anagiosplugin.yaml
  - name: anagiosplugin
    command: /usr/lib/nagios/plugins/anagiosplugin.sh -t something -p parameter
    nagios_translation: true
    timeout:
      max_timeout: 60

Not sure yet how the output could look like, e.g.

HELP anagiosplugin_exit Exit code of anagiosplugin. 0=OK, 1=Warning, 2=Critical
TYPE anagiosplugin_exit gauge
anagiosplugin_exit  2

We can come up and work together with more details, ideas etc. But this issue is mainly to ask you the question: Can something like this be considered from your side? Or do you say from the beginning, nope, won't do?

ricoberger commented 2 weeks ago

Hi, I'm open for such a feature, but I have to say that I'm not very familiar with Nagios.

If it is just about the exit code it should already be reflected in the script_exit_code metric.

Napsty commented 2 weeks ago

Cool! thanks for being open to that suggestion. It would be more about including the performance output of the Nagios plugins, potentially also some output translation.

A typical output from a Nagios plugin (here check_smart as an example) looks like this:

WARNING: Runtime_Bad_Block is non-zero (3), Uncorrectable_Error_Cnt is non-zero (1)|Reallocated_Sector_Ct=3 Power_On_Hours=31416 Power_Cycle_Count=889 Program_Fail_Count_Chip=2 Erase_Fail_Count_Chip=0 Wear_Leveling_Count=873 Used_Rsvd_Blk_Cnt_Chip=386 Used_Rsvd_Blk_Cnt_Tot=752 Unused_Rsvd_Blk_Cnt_Tot=3280 Program_Fail_Cnt_Total=3 Erase_Fail_Count_Total=0 Runtime_Bad_Block=3 Uncorrectable_Error_Cnt=1 ECC_Error_Rate=1 Offline_Uncorrectable=0 CRC_Error_Count=262 Available_Reservd_Space=1630 Total_LBAs_Written=3363924329 Total_LBAs_Read=3278685684

The output for this plugin can be ignored, but there might be other plugins where an output translator could also be helpful.

The idea behind this feature request is to (more or less) support already existing Nagios plugins on systems, which otherwise would require a rewrite to output in Prometheus style. At least for the exit code and for the performance data (= metrics) this should be doable I believe.

PS I'll be at the Open Source Monitoring Conference in Nuremberg next week. If by chance you're there, too, we can discuss about this. :-)