prometheus-community / windows_exporter

Prometheus exporter for Windows machines
MIT License
2.89k stars 695 forks source link

Issue with "hcsshim::GetComputeSystems" while using container metric #453

Closed RamBoddapati closed 2 years ago

RamBoddapati commented 4 years ago

Hi Team, I am having trouble with container metric. I have packaged the code "wmi_exporter-0.9.0-amd64.exe" as a container and deployed to AKS windows server 2019 to monitor my windows containers in AKS. But am experiencing below error with WMI_Exporter.

msg="collector container failed after 0.001015s: hcsshim::GetComputeSystems: The specified module could not be found." source="exporter.go:215"

Please help me if am missing something to configure.

Here is my Dockerfile.

# escape=`
FROM mcr.microsoft.com/windows/servercore:ltsc2019
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

COPY . C:\wmiexporter\

EXPOSE 5000

ENTRYPOINT ["powershell", `
            "C:\\wmiexporter\\wmi_exporter-0.9.0-amd64.exe", `
            "--collectors.enabled container"]
RamBoddapati commented 4 years ago

I see some referenced code which is not part of our code. Is something we need to add? Here is url https://github.com/microsoft/hcsshim/blob/master/internal/hcs/system.go

carlpett commented 4 years ago

HI @RamBoddapati. I haven't personally had opportunity to try running from within a container, so I'm not sure how/if that works. @sachinmsft, you contributed this collector, do you have any insights?

RamBoddapati commented 4 years ago

@sachinmsft, your help is more important to me.

sachinmsft commented 4 years ago

I will take a look and will reply on original thread.

Get Outlook for iOShttps://aka.ms/o0ukef


From: RamBoddapati notifications@github.com Sent: Tuesday, December 31, 2019 2:23:14 AM To: martinlindhe/wmi_exporter wmi_exporter@noreply.github.com Cc: Sachin Kumar sackumar@microsoft.com; Mention mention@noreply.github.com Subject: Re: [martinlindhe/wmi_exporter] Issue with "hcsshim::GetComputeSystems" while using container metric (#453)

@sachinmsfthttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsachinmsft&data=02%7C01%7Csackumar%40microsoft.com%7Ca01eaf907dc94cea52a608d78ddb71ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637133845958555779&sdata=g6El6A3838y7Uf6UtPgiBcSKAK9qaHPSniQ2vf7G5qk%3D&reserved=0, your help is more important to me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmartinlindhe%2Fwmi_exporter%2Fissues%2F453%3Femail_source%3Dnotifications%26email_token%3DALC2IEZQIHUYGXIIA4VCJNDQ3MMRFA5CNFSM4KBTNVUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH4A2RY%23issuecomment-569904455&data=02%7C01%7Csackumar%40microsoft.com%7Ca01eaf907dc94cea52a608d78ddb71ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637133845958565776&sdata=739KgzVno6z9215B3d94AixfcSpEqLnsyml%2BbZmZ8mw%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALC2IE3GJOC3I2AU3KTRRUTQ3MMRFANCNFSM4KBTNVUA&data=02%7C01%7Csackumar%40microsoft.com%7Ca01eaf907dc94cea52a608d78ddb71ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637133845958575768&sdata=eqLgMUl244Vy5PQNc4nXqIxnWd%2F3QHE9l0Wiw1HcMdU%3D&reserved=0.

RamBoddapati commented 4 years ago

@Sachin, just like to understand if you would able to find time to look into this issue? I am not much familiar with go lang. Hence looking for your solution. Please help me.

sachinmsft commented 4 years ago

Are you trying to get the container metrics? If so then I believe that this feature is not supported. Reason being that Getcomputesystem module will populate if server has hyper-v feature installed. And i think container does not need to have hyper-v installed. I have not run it inside the container so will try to do so and let you know if there is any possibility.

Get Outlook for iOShttps://aka.ms/o0ukef


From: RamBoddapati notifications@github.com Sent: Thursday, January 2, 2020 11:46:11 PM To: martinlindhe/wmi_exporter wmi_exporter@noreply.github.com Cc: Sachin Kumar sackumar@microsoft.com; Mention mention@noreply.github.com Subject: Re: [martinlindhe/wmi_exporter] Issue with "hcsshim::GetComputeSystems" while using container metric (#453)

@sachinhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsachin&data=02%7C01%7Csackumar%40microsoft.com%7Cc78050389cfb45506f4c08d79021010c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637136343738500160&sdata=Qp8azF3Eqc7QLsu9mXYxc2%2BH%2FhIQmyYyC%2Be2aCAVQlM%3D&reserved=0, just like to understand if you would able to find time to look into this issue? I am not much familiar with go lang. Hence looking for your solution. Please help me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmartinlindhe%2Fwmi_exporter%2Fissues%2F453%3Femail_source%3Dnotifications%26email_token%3DALC2IE5NU2YFKN3HMPY2K3DQ33UMHA5CNFSM4KBTNVUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIAQOJY%23issuecomment-570492711&data=02%7C01%7Csackumar%40microsoft.com%7Cc78050389cfb45506f4c08d79021010c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637136343738500160&sdata=AfAdiqwxsxzrxqOU49evVMDIKt5huEQVEtCWhslTnmI%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALC2IE37UTJD5RUHIEAFUA3Q33UMHANCNFSM4KBTNVUA&data=02%7C01%7Csackumar%40microsoft.com%7Cc78050389cfb45506f4c08d79021010c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637136343738510155&sdata=1FyukZxpCdX19MkYSkU0bY9nfuBExOPS2Tl%2F5kZCmhU%3D&reserved=0.

sachinmsft commented 4 years ago

You may be able to get it running inside the container once windows server has privilege container capabilities

Get Outlook for iOShttps://aka.ms/o0ukef


From: Sachin Kumar sackumar@microsoft.com Sent: Friday, January 3, 2020 8:45:52 AM To: martinlindhe/wmi_exporter reply@reply.github.com; martinlindhe/wmi_exporter wmi_exporter@noreply.github.com Cc: Mention mention@noreply.github.com Subject: Re: [martinlindhe/wmi_exporter] Issue with "hcsshim::GetComputeSystems" while using container metric (#453)

Are you trying to get the container metrics? If so then I believe that this feature is not supported. Reason being that Getcomputesystem module will populate if server has hyper-v feature installed. And i think container does not need to have hyper-v installed. I have not run it inside the container so will try to do so and let you know if there is any possibility.

Get Outlook for iOShttps://aka.ms/o0ukef


From: RamBoddapati notifications@github.com Sent: Thursday, January 2, 2020 11:46:11 PM To: martinlindhe/wmi_exporter wmi_exporter@noreply.github.com Cc: Sachin Kumar sackumar@microsoft.com; Mention mention@noreply.github.com Subject: Re: [martinlindhe/wmi_exporter] Issue with "hcsshim::GetComputeSystems" while using container metric (#453)

@sachinhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsachin&data=02%7C01%7Csackumar%40microsoft.com%7Cc78050389cfb45506f4c08d79021010c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637136343738500160&sdata=Qp8azF3Eqc7QLsu9mXYxc2%2BH%2FhIQmyYyC%2Be2aCAVQlM%3D&reserved=0, just like to understand if you would able to find time to look into this issue? I am not much familiar with go lang. Hence looking for your solution. Please help me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmartinlindhe%2Fwmi_exporter%2Fissues%2F453%3Femail_source%3Dnotifications%26email_token%3DALC2IE5NU2YFKN3HMPY2K3DQ33UMHA5CNFSM4KBTNVUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIAQOJY%23issuecomment-570492711&data=02%7C01%7Csackumar%40microsoft.com%7Cc78050389cfb45506f4c08d79021010c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637136343738500160&sdata=AfAdiqwxsxzrxqOU49evVMDIKt5huEQVEtCWhslTnmI%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALC2IE37UTJD5RUHIEAFUA3Q33UMHANCNFSM4KBTNVUA&data=02%7C01%7Csackumar%40microsoft.com%7Cc78050389cfb45506f4c08d79021010c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637136343738510155&sdata=1FyukZxpCdX19MkYSkU0bY9nfuBExOPS2Tl%2F5kZCmhU%3D&reserved=0.

RamBoddapati commented 4 years ago

@sachinmsft , Thanks Sachin for your quick return. As I am using AKS managed windows node, is there any way that we can directly deploy in AKS windows node to pull container metrics, instead of having as windows container?

Please help me.

sachinmsft commented 4 years ago

Can not you get the windows container insights through this https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-analyze ?

RamBoddapati commented 4 years ago

@sachinmsft , No Sachin, I have seen that solution earlier. its limited to Node level and not at container level to scrape all metrics like network, cpu, memory, disk ios .. etc. Hence we have started looking at open source to scrape metric to Grafana through Prometheus.

We have implemented Prometheus and grafana solution for Linux containers and looking to implement for windows containers.

Your help is needed highly. Thanks for understanding.

sachinmsft commented 4 years ago

wmi_exporter only provides the CPU, memory and network metrics https://github.com/martinlindhe/wmi_exporter/blob/master/collector/container.go#L16

And if you want to run the wmi_exporter through daemon set the way you might be using node_exporter is to use https://github.com/rancher/wins . Please take a look at here https://github.com/rancher/system-charts/blob/dev/charts/rancher-monitoring/v0.0.7/charts/exporter-node-windows/templates/daemonset.yaml

RamBoddapati commented 4 years ago

@sachinmsft , Thanks Sachin. It looks it might fit for my needs, I will try this and see if any issues. Thanks a lot for your help. It really great day to me. thanks again for you and and Carl.

I will keep in touch with you. I will get back you very soon.

RamBoddapati commented 4 years ago

@sachinmsft , Please provide me your assistance. I am ending with below error. Error:


FATA[2020-01-06T13:13:22Z] rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing open \\.\pipe\rancher_wins: The system cannot find the file specified."


It looks some issue with mount point. Please find the screen shot. I dont understand logic behind creating a volumeMount and looking for a windows service "rancher_wins". Please help me.

image

sachinmsft commented 4 years ago

@RamBoddapati I am not sure about this error as I have not run it myself. I came across this tool sometime back and gave the pointer to you just in case it is useful for you. You should do the followup for this here https://github.com/rancher/wins.

petemounce commented 4 years ago

I have this issue when I attempt to run win_exporter 0.10.2 on Windows Server 2019 GUI edition within GCE, on the host (as in, not within a container).

The host VM does not have HyperV (GCE does not support nested virtualisation yet).

time="2020-03-15T15:54:40Z" level=error msg="Err in Getting containers:hcsshim::GetComputeSystems: The specified module could not be found." source="container.go:155"
time="2020-03-15T15:54:40Z" level=error msg="failed collecting ContainerMetricsCollector metrics:<nil> hcsshim::GetComputeSystems: The specified module could not be found." source="container.go:136"
time="2020-03-15T15:54:40Z" level=error msg="collector container failed after 0.004888s: hcsshim::GetComputeSystems: The specified module could not be found." source="exporter.go:207"
time="2020-03-15T15:54:41Z" level=error msg="hcsshim::GetComputeSystems - End Operation - Error" error="hcsshim::GetComputeSystems: The specified module could not be found."

The host has had docker installed via this ansible:

---
# These steps are adapted from the official docker documentation on how to install via powershell:
# https://docs.docker.com/install/windows/docker-ee/#use-a-script-to-install-docker-ee
# The Docker EE licence is included with Windows Server

- name: download docker
  win_get_url:
    url: "{{ install_docker_download_url[ansible_os_family] }}"
    dest: "c:/windows/temp/{{ install_docker_package }}.zip"

- name: extract docker zip
  win_unzip:
    src: "c:/windows/temp/{{ install_docker_package }}.zip"
    dest: "%ProgramFiles%/"
    delete_archive: yes

# see https://blog.airdesk.com/2017/09/windows-containers-feature-.html for more details
- name: enable Windows Containers feature
  win_feature:
    name: containers
    state: present

- name: add docker to path
  win_path:
    elements: 'C:\Program Files\docker'

- name: make a group to allow non-privileged users to use docker
  win_group:
    name: docker-users

- name: add users to docker-users group
  win_group_membership:
    name: docker-users
    members: "{{ install_docker_users }}"
    state: present

- name: make sure docker config location exists
  win_file:
    path: c:/programdata/docker/config
    state: directory

- name: configure the docker daemon
  win_copy:
    src: docker-daemon.json
    dest: c:/programdata/docker/config/daemon.json

# There used to be a reboot step here and a `dockerd --register-service` step, as per
# the official installation instructions. What we found however, is that on Windows 2019
# this step was slow and flaky, and resulted in the docker daemon not starting on the buildkite agents. For
# these reasons we skip the reboot and use nssm.

- name: Install the docker service
  win_nssm:
    name: docker
    # Using Windows formatted pathes here, to make sure we don't trip up nssm.
    application: C:/Program Files/docker/dockerd.exe
    stdout_file: "{{ install_docker_logs_path[ansible_os_family] }}/dockerd.log"
    stderr_file: "{{ install_docker_logs_path[ansible_os_family] }}/dockerd.log"

# The win_nssm module does not explicitly describe restart behaviour so we set
# it to auto restart in case of failure here. https://docs.ansible.com/ansible/latest/modules/win_nssm_module.html
- name: Make sure the docker service autorestarts
  win_shell: nssm set docker AppExit Default Restart

The docker-daemon config file is

{
  "group": "docker-users"
}

The install_docker_users variable is an array of non-admin usernames.

sachinmsft commented 4 years ago

I installed docker using below commands and I don’t see this issue.

Install-Module -Name DockerMsftProvider -Repository PSGallery -Force Install-Package -Name Docker -ProviderName DockerMsftProvider Restart-Computer -Force

From: Peter Mounce notifications@github.com Sent: Sunday, March 15, 2020 9:05 AM To: martinlindhe/wmi_exporter wmi_exporter@noreply.github.com Cc: Sachin Kumar sackumar@microsoft.com; Mention mention@noreply.github.com Subject: Re: [martinlindhe/wmi_exporter] Issue with "hcsshim::GetComputeSystems" while using container metric (#453)

I have this issue when I attempt to run win_exporter 0.10.2 on Windows Server 2019 GUI edition within GCE, on the host (as in, not within a container).

The host VM does not have HyperV (GCE does not support nested virtualisation yet).

time="2020-03-15T15:54:40Z" level=error msg="Err in Getting containers:hcsshim::GetComputeSystems: The specified module could not be found." source="container.go:155"

time="2020-03-15T15:54:40Z" level=error msg="failed collecting ContainerMetricsCollector metrics: hcsshim::GetComputeSystems: The specified module could not be found." source="container.go:136"

time="2020-03-15T15:54:40Z" level=error msg="collector container failed after 0.004888s: hcsshim::GetComputeSystems: The specified module could not be found." source="exporter.go:207"

time="2020-03-15T15:54:41Z" level=error msg="hcsshim::GetComputeSystems - End Operation - Error" error="hcsshim::GetComputeSystems: The specified module could not be found."

The host has had docker installed via this ansible:


These steps are adapted from the official docker documentation on how to install via powershell:

https://docs.docker.com/install/windows/docker-ee/#use-a-script-to-install-docker-ee

The Docker EE licence is included with Windows Server

see https://blog.airdesk.com/2017/09/windows-containers-feature-.html for more details

There used to be a reboot step here and a dockerd --register-service step, as per

the official installation instructions. What we found however, is that on Windows 2019

this step was slow and flaky, and resulted in the docker daemon not starting on the buildkite agents. For

these reasons we skip the reboot and use nssm.

The win_nssm module does not explicitly describe restart behaviour so we set

it to auto restart in case of failure here. https://docs.ansible.com/ansible/latest/modules/win_nssm_module.html

The docker-daemon config file is

{

"group": "docker-users"

}

The install_docker_users variable is an array of non-admin usernames.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmartinlindhe%2Fwmi_exporter%2Fissues%2F453%23issuecomment-599229573&data=02%7C01%7Csackumar%40microsoft.com%7C7c697a41ee1a43b260b408d7c8faa0a3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637198851071911367&sdata=HGYkO8JsDOyHpngIPF%2FFGS561Uw412zI47rWa4n0%2B%2FE%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALC2IEYA5AHFJEL7Y2N63ODRHT4DDANCNFSM4KBTNVUA&data=02%7C01%7Csackumar%40microsoft.com%7C7c697a41ee1a43b260b408d7c8faa0a3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637198851071921361&sdata=GxVzhRRhqzvMsWx2%2BOFgs3o7ngndwf0%2FoONxhQWfdyE%3D&reserved=0.

cloudcafetech commented 4 years ago

Any solution running on kubernetes similar like node exporter ?

carlpett commented 4 years ago

@cloudcafetech in #581 I believe the conclusion for now is that until Windows supports privileged containers it is not possible, and you need to run the exporter directly on the host for now (which unfortunately is not possible on hosted Kubernetes services)

widdix123 commented 4 years ago

@carlpett - Is this not possible with EKS too ?

carlpett commented 4 years ago

To the best of my knowledge, yes. This isn't specific to the Kubernetes distros/managed service variants, but rather how Windows containers presently work. In the case of EKS, you have a somewhat "simple" workaround possible in defining custom workers, where you can then use your own AMIs that include the windows_exporter. For GKE/AKS and similar offerings, this is not possible.

jsturtevant commented 2 years ago

This can be closed. This has been fixed With https://github.com/prometheus-community/windows_exporter/pull/864 and has some docs. There are also examples on wiring all this up with the rest of the Prometheus stack in https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/windows.md