Closed me-oct closed 1 year ago
I have checked the values of some internal variables.
$obj{'UsedCapacity'}: 13189844566016 $used: 13189844566016 $total: 13189844566016
I have also checked the return values by wbemcli command for IBMTSSVC_ConcreteStoragePool.
-TotalManagedSpace=13189844566016 -SpaceLimitDetermination=2 -SpaceLimit=13189844566016 -VirtualCapacity=13189844566016 -UsedCapacity=13189844566016 -RealCapacity=13189844566016
The value of UsedCapacity is not what you expect, I guess.
I hope this could help you.
I don't understand what UsedCapacity shows, but does "$used == $total" mean that the storage pool has been all assigned to disk volumes? Isn't that necessarily a problematic (Warning or Critical) situation?
Hi me-oct,
IIRC i added the usage part of the check later and never made it set NOK for a specific pool. So it only returns critical for the check (which does work as you get STORAGE POOL CRITICAL
).
You could simply add $inst_count_nok++;
after line 880 you already mentioned:
if ($usedpct >= $$cfg{'warning'} && $usedpct <= $$cfg{'critical'}) {
$$out{'retRC'} = $$cfg{'RC'}{'WARNING'};
$inst_count_nok++;
} elsif ($usedpct >= $$cfg{'critical'}) {
$$out{'retRC'} = $$cfg{'RC'}{'CRITICAL'};
$inst_count_nok++;
}
However, if the pool also has 'Degraded' or 'offline' status the NOK's would be incorrectly counted up and exceed Total.
If I have some time I'll add it properly.
Hello mkorthof,
Thank you for your investigation into the issue I have raised. The key point is whether a high ratio of used StoragePool is a problem or not. We have assigned all the storage area of our storewize. There is no free storage pool and the usage ratio is 100%. If it's an unusual case, "usage (set c 100 to warn only)" could be a workaround as described in the comment of the script.
It's another issue. Regarding the threshold values for Warning and Critical in case of using defaults, I wonder the code might be wrong.
$$cfg{'critical'} = $conf{'DEFAULTS'}{$$cfg{'check'}}{'w'};
$$cfg{'warning'} = $conf{'DEFAULTS'}{$$cfg{'check'}}{'c'};
'w' is for Warning and 'c' is for 'Critical', I suppose.
Cheers, thanks a lot.
me-oct
You're right, the defaults for warn/critical got mixed up. Thanks for noticing, I've corrected it.
Regarding Pool usage - I used the script with V7000/9000's w/ FCM and DRP and it was important to monitor. Ofc it depends on the setup as there's multiple metrics for virtual/logical/physical space and thin/thick/over- provisioning to consider. The check just looks at physical pool usage. As IBM by default also alerts at 80% I choose the same value.
I decided not to add per pool OK/NOK for usage to keep it consistent with StorageVolume.
You could indeed use -c 100
to warn only or -c 101 -w 101
to never alert on usage. To skip a specific pool: -s Pool-00
(both usage/status).
mkorthof,
Thank you for your revising the script. I understand your comments on Storage pool. I use threshold values over 100 at the moment and I found the workaround with '-s' option also works fine. I appreciate your great support.
me-oct
Hi Marius, Thank you for your continuous update of the script. I have been testing behavior of the latest version v20230128-mk. I have found a problem with the command 'ConcreteStoragePool'. Here is a result of the command.
INFO: missing argument "-c crit", using default value '80' INFO: missing argument "-w warn", using default value '90' STORAGE POOL CRITICAL - NOK:0/OK:1/Total:1 |Pool-00=100%;;;; used=12TiB;;;; total=12TiB;;;; mdisks=3;;;; vols=2;;;;
There seems to be a wrong judgement condition around the line 880, I think. Please could you investigate this problem?
Many thanks.