sensor interface_wan_inbytes_kilobytes_per_second has impossibly high values

Human commented 2 years ago

I have a Netgate 3100. Its WAN interface lists as 1000baseT <full-duplex>, but I regularly see values for sensor.my_router_name_interface_wan_inbytes_kilobytes_per_second around 300,000 kBps, which is about 2.35Mbps, exceeding what my WAN interface can handle. It is, however, within the limits of the LAN interface, listed at 2500Base-KX <full-duplex>.

Is it possible that the sensor is reading the LAN interface values instead of the WAN interface values?

travisghansen commented 2 years ago

That does seem strange. I suppose it’s possible yes, or the timing for the calculation could be getting messed up as well. Do you think it’s feasible the lan interface is really receiving that much traffic?

How the thing works is it records the values for 2 samples and then also records the time of when the samples are taken. Math is then performed to derive the rate. If something is substantially delaying those times it could throw things off as well I suppose.

Human commented 2 years ago

The LAN could see traffic like that at times; I run large rsyncs over the LAN fairly often.

Experimentally, I tried to see if the LAN sensor was "stealing" the values I should have seen on the WAN sensor, and I don't see that happening, so perhaps the timing angle is more likely. It's possibly coincidental that the WAN sensor is maxing out at the LAN hardware's max.

travisghansen commented 2 years ago

Do the lan stats go crazy at the same time periods as the wan stats? If not the timing seems unlikely as the data for all interfaces is gathered at the same time…

Human commented 2 years ago

I had to start tracking all the LAN sensors for a few days to be sure, but spikes in "LAN out" seem to line up very closely with "WAN in". My ISP's upload caps out at 40Mbps, so anything higher than that shouldn't be possible. Although the aforementioned rsyncs inside the LAN could hit peak traffic of 2.5Mbps, I would not expect this to happen only when the WAN sees its peak traffic.

All of the spikes are about 300,000 kBps, around 2.3Mbps.

travisghansen commented 1 year ago

Still something needed here?

Human commented 1 year ago

I continue to see impossibly high download speeds from WAN.

Human commented 1 year ago

I uninstalled and reinstalled hass-pfsense, and the WAN values are usually within the capabilities of the WAN hardware, but I have seen one spike that suggests there are still occasional problems. The transition is quite apparent in the history data.

Screenshot_20230513_084538

travisghansen commented 1 year ago

Perhaps the way the timing is being handled is causing it to have strange values. Currently each ‘interval’ a bunch of api calls are executed in a single batch. At the beginning of the batch the timestamp is set. The general assumption is that each batch takes roughly the same amount of time and that on average each api call within the interval/batch takes roughly the same relative amount of time compared to the other calls in the batch.

If the interval is short in general the results will be more erratic as well (more samples in less time).

What do you have the interval set to? Do you think a particular call is behaving erratic? With debug logging you can observe how long each batch takes.

travisghansen commented 1 year ago

Also when you reinstalled did you change the interval?

Human commented 1 year ago

I have scan interval of 30s (apparently the default), device tracker scan of 30s (half the default but seems to not relate to network speeds). I was using 10s scan interval and 150s device tracker scan interval before I reinstalled.

I don't know how you're interfacing with pfSense, but there is a shell command that outputs a text table that includes recent traffic of a given interface, e.g.

iftop -n -b -t -s 3 -L 0 -i mvneta2

This will scan for 3s on mvneta2 and output something like this:

interface: mvneta2
IP address is: 73.230.25.170
MAC address is: c0:56:27:4b:fb:92
Listening on mvneta2
   # Host name (port/service if enabled)            last 2s   last 10s   last 40s cumulative
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
Total send rate:                                     41.2Kb     24.3Kb     24.3Kb
Total receive rate:                                  38.3Kb     21.5Kb     21.5Kb
Total send and receive rate:                         79.5Kb     45.8Kb     45.8Kb
--------------------------------------------------------------------------------------------
Peak rate (sent/received/total):                     41.2Kb     38.3Kb     79.5Kb
Cumulative (sent/received/total):                    12.1KB     10.8KB     22.9KB
============================================================================================

It already does the traffic / interval calculation for you. You could potentially scrape that and expose the peak and cumulative values as sensors.

Human commented 1 year ago

Just to clarify, -s 3 specifies a 3s interval, so you could pass whatever value you wanted there and divide by it to get a rate and the peak during that interval.

Human commented 1 year ago

The general assumption is that each batch takes roughly the same amount of time and that on average each api call within the interval/batch takes roughly the same relative amount of time compared to the other calls in the batch.

I think that's the issue - the real timing does not match the assumed timing. The fact that the anomalous readings are 2x the max normal reading backs this up. For now, I have applied a filter to divide any raw value by 2 if it exceeds the port's capability.

travisghansen / hass-pfsense

sensor interface_wan_inbytes_kilobytes_per_second has impossibly high values #108