Open simonkuhn opened 12 years ago
I'm not sure its possible to do what you're asking at all, even if per-slab stats were available, which is not what I see. In regards to the error, this was due to bad data coming from Cached/Memcached.pm library. I fixed it and updated code in the repository with newest version of the plugin now check ingthe data better and not giving a fatal error like you reported.
It is still unable to give results on items because of a bug in Cached/Memcached which did not parse results of 'stats items' properly even though 'stats items' on memcached servers I have access to did work. However 'stats slabs' did not work, it may not be well supported by memcached yet, at least not in installations I can check on.
On second check, I see that the server did give data on slabs. But just like with items, Cached::Memcached library did not handle to properly parse the results into an array. Despite that this is not a bug with a plugin that I should be fixing, I'll write extra code to handle the parsing when a library does not do it.
I'm still still unsure what feature you're asking for though. The data from slabs that I see is: STAT 1:chunk_size 96 STAT 1:chunks_per_page 10922 STAT 1:total_pages 1 STAT 1:total_chunks 10922 STAT 1:used_chunks 10920 STAT 1:free_chunks 2 STAT 1:free_chunks_end 10898 STAT 2:chunk_size 120 STAT 2:chunks_per_page 8738 STAT 2:total_pages 10 STAT 2:total_chunks 87380 STAT 2:used_chunks 87344 STAT 2:free_chunks 36 STAT 2:free_chunks_end 0 And from items this is: STAT items:1:age 8624783 STAT items:2:number 87345 STAT items:2:age 1013657
How am I supposed to get out of memory and evicted time from that?
Sorry, I should have been more precise. On a memcached 1.4.5 instance, I see from stats slabs:
[...] STAT 11:chunk_size 944 STAT 11:chunks_per_page 1110 STAT 11:total_pages 1 STAT 11:total_chunks 1110 STAT 11:used_chunks 0 STAT 11:free_chunks 1 STAT 11:free_chunks_end 1109 STAT 11:mem_requested 0 STAT 11:get_hits 0 STAT 11:cmd_set 1 STAT 11:delete_hits 0 STAT 11:incr_hits 0 STAT 11:decr_hits 0 STAT 11:cas_hits 0 STAT 11:cas_badval 0 STAT active_slabs 7 STAT total_malloced 12588433528
(repeated over every slab, e.g. STAT 9:cas_hits 0, etc.).
For stats items I see:
[...] STAT items:5:number 1 STAT items:5:age 13023981 STAT items:5:evicted 0 STAT items:5:evicted_nonzero 0 STAT items:5:evicted_time 0 STAT items:5:outofmemory 0 STAT items:5:tailrepairs 0 STAT items:5:reclaimed 0
So, in this case I would like to check that items:5:outofmemory = 0 and items:5:evicted_time is < 86400. A more general use-case might be that all slabs have items:*:outofmemory = 0 -- which could be specified as items:1:outofmemory = 0, items:2:outofmemory = 0, etc., but a shorthand would be nice.
I actually can't think of something to monitor for 'stats slabs' at this point, but thought I would bring it up since it makes the plugin barf when activated.
I don't have these variables with memcached 1.4.2. But you should be able to check items out of memory now with latest code in the repository. The variable to check would be items_5_outofmemory for slab5.
I'll add as a Feature-Request/TODO to allow specifying variable names as regex so it would match more than one actual stat data variable. My guess is it would be added sometime in the next few months, when I synchronize code of check_memcached/check_redis (latest variation of code used in my plugins) with that of check_mysqld/check_snmp_temperature which does have regex support.
I added regex to latest code in development branch: https://github.com/willixix/WL-NagiosPlugins/blob/newlib/check_memcached.pl
You will need to use new general option syntax, something like: --check="PATTERN:items_\d+outofmemory,PERF:YES,DISPLAY:NO,ZERO:OK,WARN:>0" --check="PATTERN:items\d+_evicted_time,PERF:YES,DISPLAY:NO,WARN:>86400"
If you are able to check within next 7 days that new version in general works (its quite a bit of an internal rewrite) and that regex works as you wanted, I'd much appreciate it. And I will probably merge it with main branch and release in 1-2 weeks.
I'd like to be able to monitor per-slab settings using 'stats slabs' and 'stats items', both across all slabs and with stricter thresholds for particular slabs that I care about. For instance, alert if any slab has outofmemory != 0 in stats items, and alert if slab 5 has an evicted_time below 86,400.
Unfortunately, this doesn't work out-of-the-box, since the variable names from memcached cause a perl syntax error.
check_memcached.pl -s misc,slabs
Can't use string ("STAT 1:chunk_size 96 STAT 1:chu") as a HASH ref while "strict refs" in use at check_memcached.pl line 797.