open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.89k stars 2.26k forks source link

Add Windows Service status metrics #31377

Open dodegaard opened 6 months ago

dodegaard commented 6 months ago

Component(s)

receiver/hostmetrics

Is your feature request related to a problem? Please describe.

Currently there is not a metric that can relay the running status of a Windows Service.

Describe the solution you'd like

In the hostmetrics receiver there is a library called github.com/shirou/gopsutil that is able to tap into Windows Service running status data and it would be helpful to gain access and scrape this information along with attributes that describe the service. This should most likely be an opt-in to perform this function. The process_scraper_windows.go module could be the home for the method(s).

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 6 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dodegaard commented 6 months ago

This is the corresponding library that can gain access to that information. https://github.com/shirou/gopsutil/blob/master/winservices/winservices.go

github-actions[bot] commented 4 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dconnolly-sfdc commented 2 months ago

+1 would love to see this capability added

hhgsplk commented 1 month ago

+1 needed, this is a missing feature.

pjanotti commented 1 month ago

We can stay with golang.org/x/sys by using https://pkg.go.dev/golang.org/x/sys/windows#EnumServicesStatusEx - that said host metrics receiver already imports both golang.org/x/sys and github.com/shirou/gopsutil/v4

hhgsplk commented 1 month ago

true @pjanotti I just think that using EnumServicesStatusEx ist a better way to show service states. The Metrics could / should be tagged to the asociated service. I don't see that the metrics expose the state. Or am I blind?

pjanotti commented 1 month ago

@hhgsplk I meant that we could use a metric to report service status the value of the metric would be based on the dwCurrentState as it is done by the Telegraf win_services

pjanotti commented 1 week ago

/label os:windows

pjanotti commented 1 week ago

This may makes more sense as a separate receiver instead of being added to hostmetrics.

hhgsplk commented 1 week ago

Fine by me

syron commented 1 week ago

Not aware of the library capabilities you mentioned, but what's your guys opinion on whether to use one instance of the OTEL collector to catch several servers' Windows Services like diagram 1 below or is it to install one instance per Windows server?

One OTEL collector for multiple servers

One OTEL collector per server

The question is what's more maintainable.

hhgsplk commented 1 week ago

I think that’s a different topic… or an Extension of the topic. I know that you can get status and perfmon data even from remote windows servers when the underlying component is running on a user that has such rights… but because of stability concerns of the protocol we’ve never recommended that route.

Robert Mayer @.***> schrieb am Do. 29. Aug. 2024 um 10:21:

Not aware of the library capabilities you mentioned, but what's your guys opinion on whether to use one instance of the OTEL collector to catch several servers' Windows Services like diagram 1 below or is it to install one instance per Windows server?

https://camo.githubusercontent.com/0a59992ed03a314eebce565171193b056b7e6fbf6f5ce9752c1545119a5e550c/68747470733a2f2f6d792e7265766973696f6e2e6170702f6170692f7376672f61415273673139644536385a3f763d61415273673139644536385a3a55354470515a One OTEL collector for multiple servers

https://camo.githubusercontent.com/ed1b47e9ab616ba79267c12b092222ad679109f6ef77c2209e8bbfa1b267f75f/68747470733a2f2f6d792e7265766973696f6e2e6170702f6170692f7376672f61415273673139644536385a One OTEL collector per server

The question is what's more maintainable.

— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31377#issuecomment-2318430033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJQUOONC5NTMJPQPCJOUXS3ZT5KBXAVCNFSM6AAAAABDT46RYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJYGQZTAMBTGM . You are receiving this because you were mentioned.Message ID: <open-telemetry/opentelemetry-collector-contrib/issues/31377/2318430033@ github.com>

syron commented 1 week ago

I think that is an important part in the topic, because this will potentially set different parameters to be configured. I am a bit unsure which path to go here. I have developed a software widely used within the systemintegration area where we have written an agent like system that calls remote servers. However, we have experienced limitations when the number of servers to fetch services & processes from became bigger than 20 and that was due to it taking too much time. Never experienced any instability though.

I mean, in a scenario where we would use otel collectors to fetch this kind of information of multiple servers we need to think about maintainability, but maybe I am overthinking it. I just think - for us at our consultancy firm, it's not unusual within integration, that we have 12 different windows servers that need to be monitored (not only cpu, but windows server specific features). Meaning we would need to install 12 otel collectors. But of course we have ci/cd set up to all of them so maybe that's not an issue, but those would be 12 additional services to keep track of and monitor.

pjanotti commented 1 week ago

The concern about the configuration is good: we want to define it supporting remote servers even if we don't implement it on first release. It likely should be done in a similar fashion to what was proposed here and implemented here for the Windows Event Log receiver - with the difference that for the case here we likely can't ask for credentials, the computer/account running the collector as a service must have that right IIRC (we will have to double check that).

atoulme commented 5 days ago

What's the configuration looking like? Do you have a working prototype somewhere we can review?