Open upnix opened 2 years ago
Chris Cameron @.***> writes:
The problem: In Mininet, when limiting link speed to 10Mbps (via TBF or NetEm) and adding any amount of delay with NetEm, Flent using Netperf+TCP_STREAM will return large gaps in upload data - both in CSV output and resulting charts. While Netperf acts strangely in this scenario (which I'll describe below), I believe it is Flent and the use of
apply_to
in theDATA_SETS
data structure that causes this problem.
So you're kinda right that the problem is caused by an interaction between netperf's behaviour and the Flent series computation (for certain series). Specifically, this is what happens:
At really low bandwidths, netperf will miss its data point output
deadline, which causes data points to be spread out - by a lot, as
you've noticed. This is most pronounced for upstream (TCP_STREAM
)
netperf flows.
Flent can plot these "sparse" data points just fine; however, the synthetic (computed after the fact) data series, i.e., the "average" and "total" bandwidth series, suffer.
The reason for the latter is the way Flent computes the synthetic data: it will try to generate a synthetic data point at every 'step size' interval, by linearly interpolating the points on both sides. E.g., if netperf outputs data points at t=0.198 and t=0.398, it'll interpolate between those to generate a synthetic data point at t=0.2. This will happen for each series, and the sum or average computation is done on those synthetic data points that are all aligned to the step size intervals.
The problem you're seeing happens because there's a maximum interpolation distance (of five times the step size), and if the data points are further apart than this, no interpolation will be done and you'll get gaps in the synthetic series.
Now, as for the question about what can be done about it, I'm afraid that (in my opinion) the answer turns out to be "not much". Because the fundamental problem here is that we're trying to compute a value that's not really well-defined, because we're dealing with a bunch of timeseries values.
I.e., as an example, if there are two instances of netperf running, series A outputs data points at t=1, 4, and 7 seconds, and series B outputs data points at t=3, 6 and 9 seconds, how are you really going to tell what the average throughput at t=2 seconds was?
(That's a serious question, BTW, if you have an idea for a better algorithm for interpolating data points, or just computing the synthetic series in a different way, I'm all ears).
As a workaround you could try increasing the step size; this should make the error in netperf's data output relatively smaller (since they tend to stay relatively constant in absolute values), which may help get rid of the gaps...
The problem: In Mininet, when limiting link speed to 10Mbps (via TBF or NetEm) and adding any amount of delay with NetEm, Flent using Netperf+TCP_STREAM will return large gaps in upload data - both in CSV output and resulting charts. While Netperf acts strangely in this scenario (which I'll describe below), I believe it is Flent and the use of
apply_to
in theDATA_SETS
data structure that causes this problem.The setup:
pip3 install flent
With a network configuration of 1 router, 2 subnets, and 2 hosts (
h1
,h2
), I use TBF to rate limit all links to 10Mbit/s, and NetEm to add ~28ms of delay between hosts (7ms on each link, but any amount of delay will do). I run Netserver on hosth2
, and the Flent test onh1
, with traffic crossing the router. I'll attach my configuration files.Commands:
The result: There are large gaps in the results reported by Flent.
![image](https://user-images.githubusercontent.com/6163553/164755575-ad769a42-83d2-40cd-820b-6247534b0548.png)
Narrowing the problem down Above, I showed the problem with the Flent-included
tcp_2up
test, but because I believe the issue lies with the use ofapply_to
I had to do some retooling of the test to exclude its use. So I have two new test configurations:tcp_nup_2.conf
- This is the Flent-includedtcp_nup.conf
, modified by commenting out the functionadd_stream
, the call tofor_stream_config()
and theDATA_SETS
entry "TCP upload avg". I then hard-code in what is essentially a single "TCP upload::1" test.tcp_1up_from_nup_2.conf
- This istcp_2up.conf
, but it includestcp_nup_2.conf
instead oftcp_nup.conf
Now, running the Flent test
tcp_1up_from_nup_2.conf
, upload data is shown as continuous, as you'd expect.Why? I don't know. What I do know is that the Flent test
tcp_2down
has no problems, and when I run the related Netperf command directly,TCP_MAERTS
will return results with with expected regularity (NETPERF_INTERVAL[xx]=0.2
more or less). However, the Netperf testTCP_STREAM
, whichtcp_2up
uses will have spaces between results of 4 seconds (NETPERF_INTERVAL[xx]=4
more or less). The results returned still seem accurate to me, there's just longer pauses between reporting.But this can't be the entire story, because Flent tests that don't use
apply_to
when buildingDATA_SETS
use the exact same Netperf command, gaps and all, yet don't have this problem.So it would seem to me that somehow Flent isn't properly handling gaps in reporting when
apply_to
is used forDATA_SETS
.What else fixes the problem?
Note that these are probably things that just make Netperf return results every 0.2 seconds (I haven't checked though), so they're probably not directly related to Flent.
Files of interest _Flent results when running the included
tcp_2up
test:_ tcp_2up-2022-04-22T095700.876743.TCP_2_Up.flent.gzMy Flent test that avoids gaps in upload data: tcp_1up_from_nup_2.txt tcp_nup_2.txt
The Mininet network used: 1Router_2Networks_3Hosts.txt