Open tfalkowski-nsidc opened 5 years ago
Even that isn't really correct. A while back the kernel developers got tired of people calculating free memory with calculations like these, and so they added info into /proc/meminfo
specifically for these scenarios. The current version of this plugin is using that for the metrics scripts, but check-memory-percent.rb
is still calling out to free
and doing a bunch of calculation work; it should probably be using the info from meminfo instead.
I'm not sure why that wasn't done when the metrics were changed in #68. @jpoizat @majormoses can you comment?
(If the reason is just "because we didn't", btw, then I might find some time to port this over since it's something that's bugging me.)
I have not had a time to fully review the code but I suspect some of this is because any time we try to manually parse output from a tool which does not historically cared about maintaining the consistency of the output from version to version. free
is available on many linux distributions but there is no guarantee which version and therefore how to parse the output. To illustrate even across two hosts with Ubuntu the output differs.
on 14.04:
$ free -m
total used free shared buffers cached
Mem: 7984 5978 2005 0 321 4796
-/+ buffers/cache: 860 7123
Swap: 0 0 0
on 18.04:
$ free -m
total used free shared buff/cache available
Mem: 15920 12119 984 1439 2816 2093
Here we see the string that we are searching for is different, data in a row moved to a column, when no swap is present the line is omitted, etc making it difficult to handle all of the various versions. On top of this the OS (kernel) changes how it reports memory, this is something that we see also with /proc/meminfo
output as well. While we like to pretend that monitoring memory consumption is easy, its very dependent on the system its being run on. I would suggest if you have a new enough kernel (anything 4.x should be good) and you are allowed to have a compiler present to use this check: https://github.com/sensu-plugins/sensu-plugins-memory-checks/blob/master/bin/check-ram.rb as it's the most accurate I have seen cross platform and uses more native OS methods than manually parsing the output of thefree
command.
I am all for improving the existing check and if there is a bug we should fix it. We just need to fix it so that it does not break systems where it is properly reporting correctly. It's easy to say its broken for me so it must be broken for everyone but let's remember to take a step back and check if its working for other systems/platforms before deciding that the code is fundamentally flawed. There is certainly more work that could be done to handle all those differences in platform, kernel, version of free
, etc and can try to find some time to look more closely at this but I make no promises on my end other than being able to review changes being proposed.
@xiongchiamiov I can't speak to why it was not done for the check script and only the metric script as I was not the submitter but I can assume he did not use this check. I did ask if he wanted to roll out similar functionality but got no response. I am more than happy to accept another contribution to get that updated but I would not turn away a perfectly good contribution just because I want to keep each of the x number of check/metric scripts for memory in sync. In fact I would argue the fact that we have multiple checks means that different people have different preferences and needs. I personally use https://github.com/sensu-plugins/sensu-plugins-memory-checks/blob/master/bin/check-ram.rb as it uses some lower level hooks than simply parsing a file or stdout/stderr but requires a compiler as it leverages a c extension for the heavy lifting and provides the best xplatform compatibility I have seen. That being said requiring a compiler on a production system is a non starter for some orgs and they prefer to do the parsing at a file/command layer despite that it is more prone to errors and incorrect information. If I change the output of the free
one someone is gonna complain that it does not match what they expect. I am all for doing it but we need to balance everyone's needs. I also do not work for sensu and do not receive any compensation for doing this so I can't exactly take on every feature that someone wants as I have to be tactical with how I spend my time maintaining such a large project.
If someone wants to tackle switching from free
output to using /proc/meminfo
we can release it as a major version bump so that people who properly pin their plugin versions can opt into the new behavior or can choose to using their own fork.
@xiongchiamiov it looks like someone attempted to try it but then did not respond to my feedback about them breaking systems where it was not needed: https://github.com/sensu-plugins/sensu-plugins-memory-checks/pull/60 If you are willing to work with me I am more than happy to get it over the line.
Thanks for all the info. I'm going to poke at using check-ram.rb
and see whether it fixes our use case, if not then I'll try to carve out some time for this and will let you know if I do.
FWIW, I am using both Ubuntu 16/18 and CentOS 7 systems.
So near as I can tell this issues is caused by an incorrect parsing of the values returned by the free -m command as well as not calculating the free memory correctly.
Using the following output from the free -m command as an example,
The following code does not make much sense:
The first issue is where the code attempts to set the value for FreeMem.
This command:
However, this is also an issue as is it not the full picture of the free memory on the system.
FreeMem should be defined as:
Free Memory = Free + Buffers + Cached
So, using the the example data above we would have Free Memory = Free (224 MB) + Buffers & Cache (812) = 1,036 MB.
A possible solution might look something like this:
Thank you for your effort on this project, -Thomas
EDIT BY @majormoses: used triple backticks to form code blocks so its easier to read