newrelic / nri-winservices

Windows services Integration for New Relic Infrastructure
Apache License 2.0
8 stars 8 forks source link

Integration Not Recovering After Error #65

Closed CharlieWinters closed 2 years ago

CharlieWinters commented 3 years ago

The integration is being used on a Microsoft Windows Server 2016 Datacenter host. At certain times the host seems to get maxed out on resources, i.e. we get CPU, load average, disk utilisation spikes, and then the integration appears to stop sending samples.

The issue is that the integration doesn't appear to recover until the Infrastructure agent is restarted.

Description

Some internal info has been outlined in support ticket #448676

We get errors in the logs like: time="2021-02-27T08:21:39Z" level=warning msg="integration exited with error state" component=integrations.runner.Runner error="exit status 2" integration_name=com.newrelic.winservices stderr="(last 10 lines out of 7476): ....

and a message below for the monitored services: time="2021-02-28T15:01:42Z" level=info msg="Removing inventory cache" agentEntityIDChanged=false component=PatchSender entityKey=":" offlineTime=24h0m0s

Expected Behavior

I would expect the integration to recover once resources become available again.

NR Diag results

Steps to Reproduce

Max out resources on the host, observe windows services samples not being sent, integration does not recover.

Your Environment

Microsoft Windows Server 2016

Additional context

I've built a troubleshooting dashboard that is detailed in support ticket: #448676.

roobre commented 3 years ago

Hi, thanks for reporting this issue. We are aware of the winservices integration not behaving properly in situations of high CPU and/or memory pressure. To our knowledge, the root cause for this is often WMI, a windows API the integration used and that is known to not work reliably in this situations.

Could you provide the lines that appear after the last 10 lines out of 7476 message? This would help us identify if this is the actual issue.

CharlieWinters commented 3 years ago

Thanks @roobre

We've got a support ticket open if you have access, #448676

Otherwise here's the rest of the log:

time="2021-02-27T08:21:39Z" level=warning msg="integration exited with error state" component=integrations.runner.Runner error="exit status 2" integration_name=com.newrelic.winservices stderr="(last 10 lines out of 7476): [INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355"

time="2021-02-27T11:37:56Z" level=warning msg="integration exited with error state" component=integrations.runner.Runner error="context canceled" integration_name=com.newrelic.winservices stderr="(last 10 lines out of 52): [INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355"

roobre commented 3 years ago

Thanks for raising this. Since you already have a ticket open, the support/product team will follow from that, as it is our preferred method for managing issues that require troubleshooting and/or have a big scope.

mangulonr commented 3 years ago

Thanks @roobre

We've got a support ticket open if you have access, #448676

Otherwise here's the rest of the log:

time="2021-02-27T08:21:39Z" level=warning msg="integration exited with error state" component=integrations.runner.Runner error="exit status 2" integration_name=com.newrelic.winservices stderr="(last 10 lines out of 7476): [INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355"

time="2021-02-27T11:37:56Z" level=warning msg="integration exited with error state" component=integrations.runner.Runner error="context canceled" integration_name=com.newrelic.winservices stderr="(last 10 lines out of 52): [INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355\n[INFO] exporter msg=Enabled collectors: service, cs source=exporter.go:326\n[INFO] exporter msg=Starting windows_exporter (version=0.13.0-10-gc9f1e50-dirty, branch=HEAD, revision=c9f1e5068a267aeb3e8cff47bc5323cbc050055a) source=exporter.go:351\n[INFO] exporter msg=Build context (go=go1.14.4, user=fv-az104\runneradmin@fv-az104, date=20200617-17:40:48) source=exporter.go:352\n[INFO] exporter msg=Starting server on 127.0.0.1:9182 source=exporter.go:355"

Hi Charlie

Would you be interesting on be an EAP of our new beta version? Performance problems because of WMI have been fixed.

If yes, please send me an email to mangulo@newrelic.com

Regards

mangulonr commented 3 years ago

Hi

We just released a new open beta version of the Windows Services integration which solves stability and performances issues.

All of our clients have received an email with detailed instructions on how to migrate to the new version.

More information on this blog post in Explorer Hub.

davidgit commented 2 years ago

This issue was closed because it has been inactive for a long time despite it can be reopened at any point if you think it still relevant.

Thank you for your contribution!