testing the behavior of small queue building non-ecn'd flows

dtaht commented 6 years ago

@chromi @heistp @jg @richb-hanover

Our tests with typical sampling rates in the 200ms range are misleading. We (until the development of irtt) are basically pitting request/response traffic against heavy tcp traffic and I think it's been leading us to draw some conclusions that are untrue for many other kinds of traffic, particularly with ecn enabled and the collateral damage it might cause.

The keruffle over: https://github.com/systemd/systemd/issues/9748 and https://github.com/systemd/systemd/issues/9725 is a symptom of my uneasyness.

I'm probably the only one that runs flent against a 20ms sampling interval regularly. Queues do build in this case, finally, for voip like traffic, and we end up in the "slow" queue, even the "fast" queue gets more than one packet to deliver.

Having to prioritize arp slightly as cake does in diffserv mode is one symptom, having to (as I've done for years now) ecn mark babel packets on a congested router is another. Other routing protocols that don't use IP will also always end up in a fixed queue.

In an ecn'd world, I've long thought a "special" "1025th" queue for things like arp were possibly needed. Right now that maps to the "0th" queue and can collide. There are other protocols not handled by the flow dissector.

tracking packet loss better for the measurement flows would comfort me A LOT (having a graph mixin that could pull that data out?)
a rrul_v2 test that did the 20ms irtt thing always would be good
a test that tested ecn'd flows vs non-ecn'd flows would be good.
a fixed rate, non-ecned, but queue building flow mixin (sort of like what babel does to me now). Toke picks on me for using babel on workloads like this, I view it as a subtle reminder real networks are not like a lab.
syn repeats?
RTO tracking?
A heavy flows going and squarewave tests

I was also regularly able in the latest string of extreme tests get some, out of the hundred flows started simultaneously - at 100mbit - to wind up in ecn fallback mode for some.

Using "flows 32" for fq_codel & ecn was often "not good" from the perspective of my (non-ecned) monitoring flow, things like "top" would have their output pause half screened.

dtaht commented 6 years ago

maybe we can flow-dissect arp?

heistp commented 6 years ago

Is there an argument for lowering the default interval for when flent calls irtt?

The default irtt packet length with no payload is 60 bytes, so here are bitrates at various intervals for IPv4+Ethernet (106 byte frames, and tripled for RRUL's three UDP flows):

200ms => 4.2 Kbit * 3 = 12.7 Kbit
100ms => 8.5 Kbit * 3 = 25.4 Kbit
50ms => 17.0 Kbit * 3 = 51 Kbit
20ms => 42.4 Kbit * 3 = 127.2 Kbit
10ms => 84.8 Kbit * 3 = 254.4 Kbit

50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.

Bitrates could also be lowered by ~15% (16 bytes per packet) by passing in --tstamp=midpoint and sacrificing the server processing time stat.

I'd also like to see packet loss (up vs down separately) shown by default, somehow. :)

flent-users commented 6 years ago

On Sat, Aug 25, 2018 at 10:54 AM Pete Heist notifications@github.com wrote:

Is there an argument for lowering the default interval for when flent calls irtt?

The default irtt packet length with no payload is 60 bytes, so here are bitrates at various intervals for IPv4+Ethernet (106 byte frames, and tripled for RRUL's three UDP flows):

200ms => 4.2 Kbit 3 = 12.7 Kbit 100ms => 8.5 Kbit 3 = 25.4 Kbit 50ms => 17.0 Kbit 3 = 51 Kbit 20ms => 42.4 Kbit 3 = 127.2 Kbit 10ms => 84.8 Kbit * 3 = 254.4 Kbit

50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.

Bitrates could also be lowered by ~15% (16 bytes per packet) by passing in --tstamp=midpoint and sacrificing the server processing time stat.

I'd also like to see packet loss (up vs down separately) shown by default, somehow. :)

What I'm proposing here is in part, "rrul_v2". My original rrul spec specified 20ms intervals for the isochronous flows. Originally, incidentally, voip ran at 10ms intervals, but that got relaxed due to then practical limits.

I wouldn't mind having an even more aggressive test that did 2.7ms intervals, which is as low as opus can go.

And lest you think I'm being extreme... when I was a kid... switched telephony was measured in lightspeed - a call across town felt and acted the same as wispering in your lover's ear - which I did

20ms is the equivalent of having a conversation 20 feet across the room.

I've always thought 200ms sampling was wayyyyy too high (see nyquist) and 200usec, about right. :)

As for packet overhead... well, I'm pretty sure irtt is less than opus already

As for 1mbit overheads... well... I care more at wifi and speeds much greater than 5mbit nowadays.

—

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tohojo/flent/issues/148#issuecomment-415985999, or mute the thread https://github.com/notifications/unsubscribe-auth/AerUv__DiCu3yI8_vq69oU0d1UljymgCks5uUY8UgaJpZM4WCoe6 .

Flent-users mailing list Flent-users@flent.org http://flent.org/mailman/listinfo/flent-users_flent.org

--

Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619

flent-users commented 6 years ago

and I'd make the new tests depend on irtt specifically, because they are different from ping.

we could, I guess, make the tighter irtt emission and sampling behavior global with a -v2 or -v1 parameter to flent, (-v2 becoming the default and breaking with an error message when irtt was not present) - and keeping the test names as is.

heistp commented 6 years ago

Yeah, I haven't thought about what it would mean to change the semantics of the existing tests by changing the default interval. Although, as it is, there's still a fallback to UDP_RR for current tests, so results can change if irtt isn't installed or the server isn't reachable for some reason.

I'm fine with a 20ms default interval, but that could affect folks testing on lower rate ADSL.

Sub 10ms intervals means -i needs to be passed to the server to reduce the min interval it will accept.

2.7ms intervals should be no problem. 200µs still functions on decent hardware, but below that isn't much good. There can be any number of things that could cause those kinds of latencies.

tron:~:% irtt client -q -i 400us -d 10s localhost
timer stats: 53/25000 (0.21%) missed, 2.63% error
tron:~:% irtt client -q -i 300us -d 10s localhost
timer stats: 44/33334 (0.13%) missed, 3.30% error
tron:~:% irtt client -q -i 200us -d 10s localhost
timer stats: 176/50000 (0.35%) missed, 7.40% error
tron:~:% irtt client -q -i 150us -d 10s localhost
timer stats: 2077/66666 (3.12%) missed, 12.43% error
tron:~:% irtt client -q -i 100us -d 10s localhost
timer stats: 22136/99999 (22.14%) missed, 17.79% error

flent-users commented 6 years ago

Hi Pete,

On Aug 25, 2018, at 19:53, Pete Heist notifications@github.com wrote:

Is there an argument for lowering the default interval for when flent calls irtt?

The default irtt packet length with no payload is 60 bytes, so here are bitrates at various intervals for IPv4+Ethernet (106 byte frames, and tripled for RRUL's three UDP flows):

200ms => 4.2 Kbit 3 = 12.7 Kbit 100ms => 8.5 Kbit 3 = 25.4 Kbit 50ms => 17.0 Kbit 3 = 51 Kbit 20ms => 42.4 Kbit 3 = 127.2 Kbit 10ms => 84.8 Kbit * 3 = 254.4 Kbit

50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.

Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement

Best Regards Sebastian

Bitrates could also be lowered by ~15% (16 bytes per packet) by passing in --tstamp=midpoint and sacrificing the server processing time stat.

I'd also like to see packet loss (up vs down separately) shown by default, somehow. :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Flent-users mailing list Flent-users@flent.org http://flent.org/mailman/listinfo/flent-users_flent.org

heistp commented 6 years ago

On Aug 27, 2018, at 1:40 PM, flent-users notifications@github.com wrote:

Hi Pete,

On Aug 25, 2018, at 19:53, Pete Heist notifications@github.com wrote:

50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.

Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement

We could, but I would think what might happen is that in most cases you’ll have a line that appears close to 0 Mbit, relative to other flows, and we probably wouldn’t want to change the scaling for the scaled bandwidth plots to accommodate that...

Pete

flent-users commented 6 years ago

Hi Pete,

On Aug 27, 2018, at 14:14, Pete Heist notifications@github.com wrote:

On Aug 27, 2018, at 1:40 PM, flent-users notifications@github.com wrote:

Hi Pete,

On Aug 25, 2018, at 19:53, Pete Heist notifications@github.com wrote:

50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.

Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement

We could, but I would think what might happen is that in most cases you’ll have a line that appears close to 0 Mbit, relative to other flows, and we probably wouldn’t want to change the scaling for the scaled bandwidth plots to accommodate that...

Well we currently cap the max already (we do not show all individual sample values), we could do the same for the time measurement plots, so they are only revealed in non-scaled mode. I guess that is a slippery road, because the next thing to add would be the ACK traffic for each TCP stream....
I guess if it would be useful (and easy), Toke would have added it already ;)

Best Regards Sebastian

Pete

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Flent-users mailing list Flent-users@flent.org http://flent.org/mailman/listinfo/flent-users_flent.org

tohojo commented 6 years ago

Pete Heist notifications@github.com writes:

On Aug 27, 2018, at 1:40 PM, flent-users notifications@github.com wrote:

Hi Pete,

On Aug 25, 2018, at 19:53, Pete Heist notifications@github.com wrote:

50ms wouldn't be too disruptive in most cases. At 1 Mbit, the 5% of bandwidth threshold is crossed.

Would it be possible to simply also show this bandwidth use in the bandwidth plots (say as a single data series accumulated over all irtt probes), then the user could select the interval and still be able to easily assess the effects on other bandwidth flows? I believe that would also be great for the netperf UDP/ICMP streams, but probably harder to implement

We could, but I would think what might happen is that in most cases you’ll have a line that appears close to 0 Mbit, relative to other flows, and we probably wouldn’t want to change the scaling for the scaled bandwidth plots to accommodate that...

I think the important part of this is that it would be reflected in the total bandwidth score. I've been meaning to implement this for a while for netperf UDP_RR, because otherwise you can get spurious bandwidth drops as latency decreases just because the latency measurement flows take up more bandwidth. But, well, irtt sort of made that less urgent ;)

But I could revisit; I do believe I already capture the data from irtt (I think?).

-Toke

heistp commented 6 years ago

Ok, well if we do go for it, so far in irtt's JSON there's just an average send_rate and receive_rate under stats, both of which contain an integer bps and a string text representation. send_rate ignores lost packets and receive_rate takes them into account. Let me know if anything different would be expected...

dtaht commented 6 years ago

I really do care about measuring packet loss and re-orders accurately.

I've also been fiddling with setting the tos field, to do ect(0,1) and CE. Doing that at a higher level would be good and noting the result. --ecn 1,2,3 ?

summary line of "Forward/backward path stripping dscp", "CE marks" reorders and loss

dtaht commented 6 years ago

on plotting stuff I could see adding a 4th graph much like TSDE's for loss and reorder.

dtaht commented 6 years ago

actually - and I can see pete running screaming from the room - we could add tcp-like behavior to irtt and obsolete netperf entirely except for referencing the main stack. The main reason we use netperf was because core linux devs trusted it, and the reason why we sample only is because timestamping each packet and extracting stats from it is hard in light of mss and the complexity of the netperf codebase.

Implementing tcp-like behavior and tcp-like congestion controllers on top of irtt seems simpler in comparison, and we already have better timestamp facilities than tcp in irtt.

Who here likes playing the Zerg as much as I do?

heistp commented 6 years ago

As for packet loss and reorders, there's the lost property on each round_trip that could be plotted, but for re-orders there's so far just a global late_packets, which is the number of packets who sequence number is lower than the previous one received. It would be possible to add a late flag to round_trip without breaking anything, so I added that to the list.

What's TSDE?

As for tcp'ish irtt, I think I need to go canicross the dog in the forest before I internalize that. :) Although I bet per-packet RTTs would be invaluable for investigating ecn?

pping gives per-packet rtt for tcp today, in case that useful. Perhaps an integrated tool could combine traffic generation using the standard stack and passive analysis for gathering results...

heistp commented 6 years ago

Ah, I see TSDE is Pollere's work. I need to go through the talks referenced on pollere.net asap to get smarter on that. Will be on some roofs today though, p2p connection for the neighbors...

dtaht commented 6 years ago

this convo is (purposefully) all over the place, but I'm leaning towards a rrul_v2 test with 10ms irtt intervals. Not clear to me if flent could deal with two different sample rates. Also perhaps an IRTT_REQUIRE flag --te=irrt=1

dtaht commented 6 years ago

Another rrul_v2 issue would be to correctly end up in all the queues on wifi.

flent-users commented 6 years ago

Hi Dave,

On Sep 3, 2018, at 18:00, Dave Täht notifications@github.com wrote: vnv Another rrul_v2 issue would be to correctly end up in all the queues on wifi.

So in theory rrul_cs8 should do that... (it aims to just use one dscp-marked flow per class selector for a total of 8 tcp flows per direction...) In practice I believe the mapping from dscps to ACs is highly non-linear...

Best Regards Sebastian

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Flent-users mailing list Flent-users@flent.org http://flent.org/mailman/listinfo/flent-users_flent.org

tohojo commented 6 years ago

flent-users notifications@github.com writes:

Hi Dave,

On Sep 3, 2018, at 18:00, Dave Täht notifications@github.com wrote: vnv Another rrul_v2 issue would be to correctly end up in all the queues on wifi.

So in theory rrul_cs8 should do that... (it aims to just use one dscp-marked flow per class selector for a total of 8 tcp flows per direction...) In practice I believe the mapping from dscps to ACs is highly non-linear...

Well, not that non-linear:

const int ieee802_1d_to_ac[8] = { IEEE80211_AC_BE, IEEE80211_AC_BK, IEEE80211_AC_BK, IEEE80211_AC_BE, IEEE80211_AC_VI, IEEE80211_AC_VI, IEEE80211_AC_VO, IEEE80211_AC_VO };

flent-users commented 6 years ago

HI Toke,

On Sep 3, 2018, at 19:37, Toke Høiland-Jørgensen notifications@github.com wrote:

flent-users notifications@github.com writes:

Hi Dave,

On Sep 3, 2018, at 18:00, Dave Täht notifications@github.com wrote: vnv Another rrul_v2 issue would be to correctly end up in all the queues on wifi.

So in theory rrul_cs8 should do that... (it aims to just use one dscp-marked flow per class selector for a total of 8 tcp flows per direction...) In practice I believe the mapping from dscps to ACs is highly non-linear...

Well, not that non-linear:

const int ieee802_1d_to_ac[8] = { IEEE80211_AC_BE, IEEE80211_AC_BK, IEEE80211_AC_BK, IEEE80211_AC_BE, IEEE80211_AC_VI, IEEE80211_AC_VI, IEEE80211_AC_VO, IEEE80211_AC_VO };

Well, aren't these values according to IEEE P802.1p (https://en.wikipedia.org/wiki/IEEE_P802.1p)?

PCP value Priority Acronym Traffic types 1 0 (lowest) BK Background 0 1 (default) BE Best effort 2 2 EE Excellent effort 3 3 CA Critical applications 4 4 VI Video, < 100 ms latency and jitter 5 5 VO Voice, < 10 ms latency and jitter 6 6 IC Internetwork control 7 7 (highest) NC Network control

These map the 3 bit priority PCP values from VLAN tags to ACs, but note the dance with PCP 1 being lower than PCP 0, and more importantly the different interpretations about PCP 2, is it "excellent effort" or another BK code point? I guess the point I wanted to make is that mapping down from the 6bit DSCP to ACs is not very intuitive (with linear mapping being the "most intuitive"). Anyway, I am totally fine with just using 3 bits, this is still plenty for priority hierarchies that I can still understand ;)

Best Regards Sebastian

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Flent-users mailing list Flent-users@flent.org http://flent.org/mailman/listinfo/flent-users_flent.org

tohojo commented 6 years ago

flent-users notifications@github.com writes:

HI Toke,

On Sep 3, 2018, at 19:37, Toke Høiland-Jørgensen notifications@github.com wrote:

flent-users notifications@github.com writes:

Hi Dave,

On Sep 3, 2018, at 18:00, Dave Täht notifications@github.com wrote: vnv Another rrul_v2 issue would be to correctly end up in all the queues on wifi.

So in theory rrul_cs8 should do that... (it aims to just use one dscp-marked flow per class selector for a total of 8 tcp flows per direction...) In practice I believe the mapping from dscps to ACs is highly non-linear...

Well, not that non-linear:

const int ieee802_1d_to_ac[8] = { IEEE80211_AC_BE, IEEE80211_AC_BK, IEEE80211_AC_BK, IEEE80211_AC_BE, IEEE80211_AC_VI, IEEE80211_AC_VI, IEEE80211_AC_VO, IEEE80211_AC_VO };

Well, aren't these values according to IEEE P802.1p (https://en.wikipedia.org/wiki/IEEE_P802.1p)?

PCP value Priority Acronym Traffic types 1 0 (lowest) BK Background 0 1 (default) BE Best effort 2 2 EE Excellent effort 3 3 CA Critical applications 4 4 VI Video, < 100 ms latency and jitter 5 5 VO Voice, < 10 ms latency and jitter 6 6 IC Internetwork control 7 7 (highest) NC Network control

These map the 3 bit priority PCP values from VLAN tags to ACs, but note the dance with PCP 1 being lower than PCP 0, and more importantly the different interpretations about PCP 2, is it "excellent effort" or another BK code point?

I guess the point I wanted to make is that mapping down from the 6bit DSCP to ACs is not very intuitive (with linear mapping being the "most intuitive").

Anyway, I am totally fine with just using 3 bits, this is still plenty for priority hierarchies that I can still understand ;)

Oh, it's absolutely a mess. So much so that the IETF had to write a whole RFC on it: https://tools.ietf.org/html/rfc8325

-Toke

tohojo / flent

testing the behavior of small queue building non-ecn'd flows #148