prometheus-community / windows_exporter

Prometheus exporter for Windows machines
MIT License
2.92k stars 703 forks source link

windows_exporter service failed to start on reboot #551

Closed f1-outsourcing closed 1 year ago

f1-outsourcing commented 4 years ago

After updates and rebooting the server, the windows_exporter service was not running

The windows_exporter service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.

When I look at the recovery options of the windows_exporter service they are not as other 'standard' windows services. Looks like none has set reset fail count after:0 and restart service after: 0

exporter: exporter

other examples: workstation server firewall nla

I am not really an expert on the settings of recovery of services, but maybe someone should look at these. Maybe it is better to put this minutes on 3 or 5?

https://docs.microsoft.com/en-us/archive/blogs/jcalev/some-tricks-with-service-restart-logic https://social.microsoft.com/Forums/ro-RO/3db76753-4607-4a20-97a0-790c73e379cc/the-actions-after-system-service-failure?forum=winserver8gen

breed808 commented 2 years ago

There's probably all sorts of reasons why I shouldn't do it in the way that I have but ill rebase my fork early next week (its a mess right now) and share where I've gotten to to see if anyone can help get it over the line, or tell me what I've done wrong >.<

@jammiemil I'm happy to assist with getting this working and merged. Thanks so much for your time and effort on this issue!

jammiemil commented 2 years ago

Ok After a couple of false starts thanks to my trashed local git i have PR #1047 Open for this, It still needs a little work to get it over the line.

breed808 commented 2 years ago

Hi all, #1047 has been merged. Would anyone here be able to run a test deployment from master and confirm if this has resolved the issue? I'd prefer to test this across a larger sample size before claiming this has been resolved

I can provide pre-built EXEs or MSIs if that is preferable.

safster123 commented 2 years ago

Hi all, #1047 has been merged. Would anyone here be able to run a test deployment from master and confirm if this has resolved the issue? I'd prefer to test this across a larger sample size before claiming this has been resolved

I can provide pre-built EXEs or MSIs if that is preferable.

I'm happy to test this but I'm not great with GIT so would be great if you could provide an MSI

breed808 commented 2 years ago

See attached. I've included both the EXE and MSI built from master. Let me know if there are any issues.

windows_exporter.zip

JsBergbau commented 2 years ago

Can confirm works now. Especially with an older server where even delayed start and setting sc.exe config windows_exporter depend= Winmgmt didn't help.

safster123 commented 2 years ago

Finally got a chance to test this. Happy to say that it appears fixed from my testing. I was able to get it to consistently fail with previous versions but the provided version above seems to have done the trick and it now starts successfully. Thanks to all involved in getting this over the line.

breed808 commented 2 years ago

Thanks all. I'll aim to get a new release with this fix out in the next few days, then hopefully we can close this one off :crossed_fingers:

matthewsc05 commented 2 years ago

Hi @breed808 tested the windows_exporter.zip provided above. It fixed the cpu usage issues and timeouts which I was having. What I noticed however is that I am experiencing a memory leak. At one point agent hit 1GB ram usage.

breed808 commented 2 years ago

@matthewsc05 is the memory leak present on the latest version or just on the build I provided earlier?

Andy-Techical commented 2 years ago

Hi @breed808 @matthewsc05

I've had the build from the post above (on 26th August by breed808) installed for the last week or so on 3 servers (Windows Server 2016) and can't see any high memory from it. I've compared it to the rest of my servers running an older version of Windows Exporter and the memory levels look similar across the versions.

Thanks

matthewsc05 commented 2 years ago

Hi @breed808 its with the previous build provided above windows_exporter.zip

Could this be related to a particular windows version? This was tested on windows server 2019 - we had to remove the agent due to the high memory usage.

breed808 commented 2 years ago

@matthewsc05 it's more likely to be the collectors you have enabled. We've identified some problem collectors using WMI as a metric source in #813, and there's been a recently identified leak in the scheduled_task collector in #1063.

That said, if you're running the same collectors between versions and there's a noticeable difference in the new version, we'll need to investigate.

I'm concerned that we may be introducing a new issue in the next release while trying to fix this one.

matthewsc05 commented 2 years ago

@matthewsc05 it's more likely to be the collectors you have enabled. We've identified some problem collectors using WMI as a metric source in #813, and there's been a recently identified leak in the scheduled_task collector in #1063.

That said, if you're running the same collectors between versions and there's a noticeable difference in the new version, we'll need to investigate.

I'm concerned that we may be introducing a new issue in the next release while trying to fix this one.

Hi @breed808 I agree, for me I was using the default configuration, so everything was enabled. I am moving to a dedicated configuration so this outcome might change for me soon.

Fix is still important in my opinion as in extreme cases the 30s timeout is being hit. For me when I had this using the above provided package and deleting registry keys of previous installation fixed my issues until I hit this memory issue which I am looking into improving.

breed808 commented 2 years ago

Fair enough. If we can't identify the cause of the issue in the next few days, I'll cut a release and list it as a known bug.

Let me know if you find anything while using a dedicated configuration.

fsiler commented 2 years ago

@breed808 really appreciate your attention on this. Do you have any timeframe on an update? Thanks!

breed808 commented 2 years ago

Apologies for the delay, life got in the way again. I've released v0.20.0, but I'll keep this issue open for week or so in case anything has been missed.