Open Human opened 2 years ago
That does seem strange. I suppose it’s possible yes, or the timing for the calculation could be getting messed up as well. Do you think it’s feasible the lan interface is really receiving that much traffic?
How the thing works is it records the values for 2 samples and then also records the time of when the samples are taken. Math is then performed to derive the rate. If something is substantially delaying those times it could throw things off as well I suppose.
The LAN could see traffic like that at times; I run large rsyncs over the LAN fairly often.
Experimentally, I tried to see if the LAN sensor was "stealing" the values I should have seen on the WAN sensor, and I don't see that happening, so perhaps the timing angle is more likely. It's possibly coincidental that the WAN sensor is maxing out at the LAN hardware's max.
Do the lan stats go crazy at the same time periods as the wan stats? If not the timing seems unlikely as the data for all interfaces is gathered at the same time…
I had to start tracking all the LAN sensors for a few days to be sure, but spikes in "LAN out" seem to line up very closely with "WAN in". My ISP's upload caps out at 40Mbps, so anything higher than that shouldn't be possible. Although the aforementioned rsyncs inside the LAN could hit peak traffic of 2.5Mbps, I would not expect this to happen only when the WAN sees its peak traffic.
All of the spikes are about 300,000 kBps, around 2.3Mbps.
Still something needed here?
I continue to see impossibly high download speeds from WAN.
I uninstalled and reinstalled hass-pfsense, and the WAN values are usually within the capabilities of the WAN hardware, but I have seen one spike that suggests there are still occasional problems. The transition is quite apparent in the history data.
Perhaps the way the timing is being handled is causing it to have strange values. Currently each ‘interval’ a bunch of api calls are executed in a single batch. At the beginning of the batch the timestamp is set. The general assumption is that each batch takes roughly the same amount of time and that on average each api call within the interval/batch takes roughly the same relative amount of time compared to the other calls in the batch.
If the interval is short in general the results will be more erratic as well (more samples in less time).
What do you have the interval set to? Do you think a particular call is behaving erratic? With debug logging you can observe how long each batch takes.
Also when you reinstalled did you change the interval?
I have scan interval of 30s (apparently the default), device tracker scan of 30s (half the default but seems to not relate to network speeds). I was using 10s scan interval and 150s device tracker scan interval before I reinstalled.
I don't know how you're interfacing with pfSense, but there is a shell command that outputs a text table that includes recent traffic of a given interface, e.g.
iftop -n -b -t -s 3 -L 0 -i mvneta2
This will scan for 3s on mvneta2 and output something like this:
interface: mvneta2
IP address is: 73.230.25.170
MAC address is: c0:56:27:4b:fb:92
Listening on mvneta2
# Host name (port/service if enabled) last 2s last 10s last 40s cumulative
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
Total send rate: 41.2Kb 24.3Kb 24.3Kb
Total receive rate: 38.3Kb 21.5Kb 21.5Kb
Total send and receive rate: 79.5Kb 45.8Kb 45.8Kb
--------------------------------------------------------------------------------------------
Peak rate (sent/received/total): 41.2Kb 38.3Kb 79.5Kb
Cumulative (sent/received/total): 12.1KB 10.8KB 22.9KB
============================================================================================
It already does the traffic / interval calculation for you. You could potentially scrape that and expose the peak and cumulative values as sensors.
Just to clarify, -s 3
specifies a 3s interval, so you could pass whatever value you wanted there and divide by it to get a rate and the peak during that interval.
The general assumption is that each batch takes roughly the same amount of time and that on average each api call within the interval/batch takes roughly the same relative amount of time compared to the other calls in the batch.
I think that's the issue - the real timing does not match the assumed timing. The fact that the anomalous readings are 2x the max normal reading backs this up. For now, I have applied a filter to divide any raw value by 2 if it exceeds the port's capability.
I have a Netgate 3100. Its WAN interface lists as
1000baseT <full-duplex>
, but I regularly see values forsensor.my_router_name_interface_wan_inbytes_kilobytes_per_second
around 300,000 kBps, which is about 2.35Mbps, exceeding what my WAN interface can handle. It is, however, within the limits of the LAN interface, listed at2500Base-KX <full-duplex>
.Is it possible that the sensor is reading the LAN interface values instead of the WAN interface values?