it-novum / openitcockpit-agent-go

Cross-Platform Monitoring Agent for openITCOCKPIT written in Go
https://openitcockpit.io/download_agent/
Apache License 2.0
5 stars 2 forks source link

Extremly High CPU Usage with Agent #58

Open RRonGit opened 3 years ago

RRonGit commented 3 years ago

Agent Mode:

Versions

Operating system Windows Server 2012 to 2019

Describe the bug on all our VMs (25) (Vmware) we have an extremely high CPU load since the installation of the new agent (previously we used agent NSCP-0.4.1.90 with Openitcokpit 2.7.15). Sample after Installation the agent:

image

And here after shutdown the OpenITCockpit Agent Service:

image

The Process is «Powershell»:

image

he change from nothing to 50% CPU load. This happens on all OS versions from Windows Server 2012 to 2019

nook24 commented 3 years ago

Hi @RRonGit,

Did you have defined any custom checks? Due to it is caused by a PowerShell Process it is maybe the event log is the source of the issue. Could you please try to disable the Windows event log in the config.ini of the Agent and check if the issue is gone?

wineventlog = False

To apply the new configuration you need to restart the openITCOCKPITAgent service

RRonGit commented 3 years ago

Hi @nook24,

Yes we use custom check to control the Windows System Time.
unfortunately it's not really better. I tested without custom checks and add «wineventlog = False» to config.ini We restarted the agent and we even restarted the server itself.

(Unfortunately, I couldn't insert screenshots directly here, so with a link.)

Pic1 Pic2

nook24 commented 3 years ago

Pic1: 1xqKqdPl

Pic2: tNCrVrtP


You need to figure out where this PowerShell and Service Host: Windows Event Log is coming from. The openITCOCKPIT Agent itself is querying WMI or uses syscalls directly.

The only check which uses a PowerShell as workaround is the Windows Event Log check. As soon as you disable this check, the Agent will not longer start a PowerShell process.

On our testsystems (Windows Server 2016, 2019 and Windows 10 all with latest updates) the agent is running form 0 to 2% CPU according to TaskManager (with disabled Event Log) Also there is no PowerShell process in TaskManager.

Custom Checks will be executed through an PowerShell or CMD, depending on your configuration. Which brings us back to the question, where is this PowerShell process coming from :)

Maybe antivirus?

PS: You can insert screenshots via drag and drop.

RRonGit commented 3 years ago

Hi @nook24,

At first we did not notice that the value "wineventlog" is already in the config.ini file and just inserted the line at the bottom of the config.ini file. After we deleted our inserted line (wineventlog = False) and set the value in the right place, the CPU load was fine again and the Powershell process was no longer there.

Many Thanks.

nook24 commented 3 years ago

I have done some optimisation to the event log check today. So hopefully it will consume less CPU power. Unfortunately i had not much time to test this. It would be great if you could test this on one of your systems as well.

@Terminator81 This could be interesting for you as well

How to apply the patch:

  1. Stop the Service openITCOCKPITAgent
  2. Open File Explorer and go to C:\Program Files\it-novum\openitcockpit-agent\
  3. Copy the file openitcockpit-agent.exe to openitcockpit-agent.exe.backup
  4. Copy the openitcockpit-agent.exe from the zip openitcockpit-agent.exe.zip file to C:\Program Files\it-novum\openitcockpit-agent\
  5. Start the openITCOCKPITAgent service again
RRonGit commented 3 years ago

Hi @nook24

The customized openitcockpit-agent.exe also causes a high CPU load. (wineventlog = True)

cpu_load_ps02_19_07_2021_eingezeichnet

nook24 commented 2 years ago

Today @Terminator81 reported an issue with the test version build from the eventlog-rr branch referenced in: https://github.com/it-novum/openitcockpit-agent-go/issues/58#issuecomment-881464888

To safe CPU time, the eventlog-rr implementation uses an round robbin hashlist which will query one hour of eventlogs on startup, and only pull the delta of eventlog entries and append them to the list. The issue with this is, that the agent will still keep error messages for one hour, even if the user manually truncated the eventlog. The current stable version of the agent on the other hand will always query the whole event log, which is the reason why it is burning more CPU time

nook24 commented 2 years ago

Thanks for being patient. With the release of version 3.0.9 of the agent, we are using WMI as default datasource to query the windows event log entries. We have tested this on Windows Server 2016 and Windows Server 2022.

This reduce the amount of used CPU usage. The PowerShell method is as fallback option still available via config.ini.

Please let me know if this resolves your CPU usage issues.