ravinet / mahimahi

Web performance measurement toolkit
GNU General Public License v3.0
240 stars 128 forks source link

locking up with netperf #53

Closed dtaht closed 10 years ago

dtaht commented 10 years ago

I can lock it up with netperf in a matter of seconds on 3.11.0-19-generic #33-Ubuntu SMP. Which is too bad as toke is making huge progress with netperf-wrapper and a new gui...

I setup two delay shells, then start netserver in one, and try to connect to the other. Fire up netperf -H 10.64.0.2 and inside of a few thousand packets, it hangs...

[delay 10 ms] d@nuc:~/git/mahimahi$ ifconfig ingress Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:100.64.0.2 P-t-P:100.64.0.1 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MTU:1500 Metric:1 RX packets:6627 errors:0 dropped:0 overruns:0 frame:0 TX packets:4376 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:9929172 (9.9 MB) TX bytes:248836 (248.8 KB)

other window:

d@nuc:~/git/netperf-wrapper$ delayshell 10 [delay 10 ms] d@nuc:~/git/netperf-wrapper$ netperf -H 100.64.0.2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 100.64.0.2 () port 0 AF_INET : demo

keithw commented 10 years ago

Can reproduce, thanks. We'll look into it!

dtaht commented 10 years ago

sudo tc qdisc add dev ingress root fq_codel in each window

works for a lot longer on a single netperf, but fails on netperf-wrapper.

(I don't see any drops or marks from tc however)

[delay 10 ms] d@nuc:~/git/mahimahi$ ifconfig ingress Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:100.64.0.2 P-t-P:100.64.0.1 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MTU:1500 Metric:1 RX packets:279403 errors:0 dropped:0 overruns:0 frame:0 TX packets:139149 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:416963167 (416.9 MB) TX bytes:10435455 (10.4 MB)

https://github.com/tohojo/netperf-wrapper should thoroughly blow you up. :)

keithw commented 10 years ago

I think our crime here is just in assuming that writes to a datagram socket won't block.

https://github.com/ravinet/mahimahi/blob/master/delay_queue.cc#L17

We can fix this with an appropriate poll.

dtaht commented 10 years ago

Well, threads would be better... there are some useful ringbuffer implementations out there... and an extensive poll might be damaging at higher rates, but... ok. :)

/me ducks

Running a single 10ms delay shell over a slower path (14-15ms baseline, ~28mbit down, 4.4 up) worked. so did various other attempts. Sorry my first attempt went boom at ietf, and it took me so long to try again.

see: http://snapon.lab.bufferbloat.net/~d/dshell/ for some data

example box plot

http://snapon.lab.bufferbloat.net/~d/dshell/nodelay_10ms_delay_30ms_delay.png

This is a pretty useful use case by itself.

(I usually use an entirely separate box for this sort of insanity with netem - and sat down to try writing something threadedly intense a while back and ran out of time to work on it)

Will fiddle some more.

keithw commented 10 years ago

I think that should do it -- thanks for testing and for the report. Please let us know if you find any more bugs. :-)