newrelic / nri-nagios

New Relic Infrastructure Nagios Integration
MIT License
3 stars 13 forks source link

bin/nri-nagios exits immediately on invalid command output #93

Open barrypitman opened 8 months ago

barrypitman commented 8 months ago

I have ~30 checks that I want to run on certain virtual machines, and I'd like to report the output of the commands to New Relic using this integration.

However, I found that if just one of the checks outputs an invalid exit code / output, then none of the checks report back to New Relic, and the integration exits with an error:

root@vm /var/log # /var/db/newrelic-infra/newrelic-integrations/bin/nri-nagios  -service_checks_config /etc/newrelic-infra/integrations.d/nagios-service-checks.yml -verbose
panic: runtime error: index out of range [1] with length 0

goroutine 118 [running]:
main.parseOutput({0x0?, 0xc000117ca0?})
        /go/src/github.com/newrelic/nri-nagios/src/main.go:275 +0x245
main.collectServiceCheck({{0xc00016d140, 0x2f}, {0xc0001bb580, 0x1, 0x1}, 0xc000115560, 0x1}, 0xc0000a25a0?, 0xc0000a2540?, {0x5436d3, ...})
        /go/src/github.com/newrelic/nri-nagios/src/main.go:181 +0x58a
main.main.func1({{0xc00016d140, 0x2f}, {0xc0001bb580, 0x1, 0x1}, 0xc000115560, 0x1})
        /go/src/github.com/newrelic/nri-nagios/src/main.go:100 +0xa5
created by main.main in goroutine 1
        /go/src/github.com/newrelic/nri-nagios/src/main.go:99 +0x5a7

Of course, the cause of the failing check should be addressed, but this shouldn't cause the entire suite of checks to stop reporting, the integration should report the result of the failing check as an error, similar to what Nagios would do in this scenario.

workato-integration[bot] commented 8 months ago

https://new-relic.atlassian.net/browse/NR-215396

paologallinaharbur commented 8 months ago

It seems to be failing here:

    match := reOutput.FindStringSubmatch(output)
    result := make(map[string]string)
    for i, name := range reOutput.SubexpNames() {
        if i != 0 && name != "" {
            result[name] = match[i]
        }
    }

Can you attach the nagios-service-checks.yml, specifying which one is failing and its output when you run it manually?

Which version of the integration are you using?

paologallinaharbur commented 8 months ago

The root cause seems to be match having fewer elements than expected due to the command output. We can add a check to exit in a more graceful way avoiding a panic