poxet / Influx-Capacitor

Influx-capacitor collects metrics from windows machines using Performance Counters. Data is sent to influxDB to be viewable by grafana.
http://influx-capacitor.com
MIT License
44 stars 13 forks source link

System configuration details #27

Closed nathanwebb closed 8 years ago

nathanwebb commented 8 years ago

Hi,

It would be really handy to get the current system configuration, such as Total installed RAM. Unfortunately perfmon doesn't provide these counters. Without this, it's pretty hard to figure out current memory utilisation.

I don't know anything about C#, but found this MSDN guide:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa366589%28v=vs.85%29.aspx

Other details that help with capacity planning are number and type of CPUs, number and size of disks, number of network interfaces, etc...

poxet commented 8 years ago

The data you are talking about does not change over time. So, having the Influx-Capacitor reporting this every few seconds or minutes does not seem very helpful.

Please explain the scenario where this kind of data would be helpful to log in InfluxDB.

nathanwebb commented 8 years ago

Every few seconds would be crazy, but it does change, especially on virtual machines. The settings that change the most are disk sizes (adding extra san luns and extending a volume) or number of disks (again, particularly adding san disks), and to a lesser extent, changes to memory size (mostly only on VMs).

On systems that I work on, I'd expect to see about 10 - 30 changes per year, so not high volume, but a slowly changing dimension nevertheless. If you are trying to a 12-month forecast of the memory usage of a pool of servers, it is extremely painful to have to search through a configuration database, that may or may not have been updated correctly, to get the current and historical memory size for each server.

If you know of another agent that would do this, then I'll look at that instead, but it seems to fit with Influx-Capacitor's usage.

I was actually thinking of suggesting something on your thread about metadata. It would be great if each agent registered itself in mongodb (or similar), and could then download its configuration from there as well. When it registers, it could upload the server's system configuration (e.g. memory, CPU types and number, disk, etc..), the agent version, etc... and then check to see if it needed to download a new version of the config. It would do this on startup, and then periodically after that as well (say every hour?). The system configuration could also be uploaded every time it changed (which wouldn't be very often).

Some examples of scenarios where this is helpful are both with the number of disks, and the amount of total RAM. Sometimes, monitoring of disks fails, either due to known problems with perfmon (stops reporting on some instances - I can dig out the MS bug reference if it helps), or a human-related issue. Either way, it is really important to know the expected number of disks versus the reported number. Whenever there is a gap, you need to investigate.

RAM is even more of an issue. Memory capacity issues on nearly(*) all windows servers can be estimated by looking at Available Bytes as a percentage of Total Physical Memory. If this falls below 10%, then you should start looking for performance issues. If it falls below 5%, then I'd say you almost certainly have degraded performance.

So, how can you calculate Available Bytes / Total Memory? Well, without the denominator (Total memory isn't in perfmon), you can't. You fallback to looking at proxies, like page faults, but this only tells you that you are currently experiencing a performance issue, and isn't useful for predicting when you will experience issues.

poxet commented 8 years ago

I see your point. You convinced me. :)

Two good suggestions there. I will create two separate cases for them, and you can help me to formulate the requirements. :)

  1. Machine metrics
  2. Centralized configuration management