lausser / check_nwc_health

nwc = network component. This plugin checks lots of aspects of routers, switches, wlan controllers, firewalls,.....
http://labs.consol.de/nagios/check_nwc_health
GNU General Public License v2.0
149 stars 88 forks source link

Interface traffic calculation is off reality (?) #238

Closed Napsty closed 4 years ago

Napsty commented 4 years ago

A discussion between network administrator and me lead to the conclusion that interface speeds are wrongly graphed in our Grafana which is using performance data coming via Icinga2 from check_nwc_health plugin. At first I suspected an error in the derive graph but then realized that the performance data are actual bit/s values, not a counter.

check_nwc_health seems to be using the inputRate and outputRate values to represent traffic (bit/s):

$ ./check_nwc_health --hostname=switch --statefilesdir=/tmp --protocol=2c --community=public --mode=interface-usage --name TenGigabitEthernet0/0/1 -vvv
[INTERFACESUBSYSTEM]
bootTime: 1535241918
duplicates: HASH(0x3c81768)
ifCacheLastChange: 1584614704
ifTableLastChange: 1566825882.12
interface_cache: HASH(0x3c853d8)
info: checking interfaces
[64BIT_2]
delta_ifHCInOctets: 563214875
delta_ifHCOutOctets: 522374095
delta_ifInBits: 4505719000
delta_ifOutBits: 4178992760
delta_timestamp: 40
ifAlias: Link to LAN
ifDescr: TenGigabitEthernet0/0/1
ifHCInOctets: 973746056100280
ifHCInOctets_per_sec: 14080371.875
ifHCOutOctets: 619331245398793
ifHCOutOctets_per_sec: 13059352.375
ifHighSpeed: 10000
ifInOctets: 3955653048
ifIndex: 2
ifName: Te0/0/1
ifOperStatus: up
ifOutOctets: 1256282889
ifSpeed: 4294967295
inputRate: 112642975
inputUtilization: 1.12642975
maxInputRate: 10000000000
maxOutputRate: 10000000000
outputRate: 104474819
outputUtilization: 1.04474819
info: interface TenGigabitEthernet0/0/1 (alias Link to LAN) usage is in:1.13% (112642975.00bit/s) out:1.04% (104474819.00bit/s)

OK - interface TenGigabitEthernet0/0/1 (alias Link to LAN) usage is in:1.13% (112642975.00bit/s) out:1.04% (104474819.00bit/s)
checking interfaces
interface TenGigabitEthernet0/0/1 (alias Link to LAN) usage is in:1.13% (112642975.00bit/s) out:1.04% (104474819.00bit/s) | 'TenGigabitEthernet0/0/1_usage_in'=1.13%;80;90;0;100 'TenGigabitEthernet0/0/1_usage_out'=1.04%;80;90;0;100 'TenGigabitEthernet0/0/1_traffic_in'=112642975;8000000000;9000000000;0;10000000000 'TenGigabitEthernet0/0/1_traffic_out'=104474819;8000000000;9000000000;0;10000000000

According to the source code, inputRate and outputRate are calculated like this: https://github.com/lausser/check_nwc_health/blob/master/plugins-scripts/Classes/IFMIB/Component/InterfaceSubsystem.pm#L750

    $self->{inputRate} = $self->{delta_ifInBits} / $self->{delta_timestamp};
    $self->{outputRate} = $self->{delta_ifOutBits} / $self->{delta_timestamp};

I ran check_nwc_health once a minute for 20 minutes and looked at the input/incoming traffic values. I used the following command to get the raw data:

while true; do date; ./check_nwc_health --hostname=switch --statefilesdir=/tmp --protocol=2c --community=public --mode=interface-usage --name TenGigabitEthernet0/0/1 -vvv | egrep "(ifHCInOctets|ifInOctets|inputRate)"; sleep 60; echo

After 20 mins I stopped the process and created a spreadsheet with the data. Columns G-K contain Mbit/s calculations based on calculations of the different values. The Mbit/s calculation using inputRate is way off which suggests that the value of inputRate itself is wrongly calculated? I also tested the same with a more recent version of check_nwc_health but with the same results.

image

Am I making a mistake or misunderstanding somethign or can you confirm that the interface traffic is not correct?

Napsty commented 4 years ago

Sorry - my mistake. I wrongly misinterpretet octet as bit but it's actual 8 * 1 bit. Hence the calculation in the source code:

    $self->{delta_ifInBits} = $self->{delta_ifInOctets} * 8;
    $self->{delta_ifOutBits} = $self->{delta_ifOutOctets} * 8;

    $self->{inputRate} = $self->{delta_ifInBits} / $self->{delta_timestamp};
    $self->{outputRate} = $self->{delta_ifOutBits} / $self->{delta_timestamp};

If I adjust my manual calculations in the spread sheet and use 8*octet on all the octet based values, I get to the same traffic speed.