yubin00145865 / iperf

Automatically exported from code.google.com/p/iperf
Other
0 stars 0 forks source link

TCP reverse mode locks up most times #111

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. recompile the 3.0 code as published on the website
2. on the server run: iperf3 --server
3. on the client run: iperf3 -c <server ip> -V -R

What is the expected output? What do you see instead?
Expected: 10 seconds of transfer, a detailed statistics afterwards

Most times (about 90%) I get:

iperf version 3.0-RC5 (07 November 2013)
Linux intratest132.net.lan 3.4.51-1.i2n.i686.PAE #1 SMP Fri Jun 28 13:49:25 UTC 
2013 i686 i686 i386 GNU/Linux
Time: Tue, 19 Nov 2013 23:31:55 GMT
Connecting to host 172.16.1.133, port 5201
Reverse mode, remote host 172.16.1.133 is sending
      Cookie: intratest132.net.lan.1384903915.1454
      TCP MSS: 1448 (default)
[  4] local 172.16.1.132 port 35112 connected to 172.16.1.133 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 
seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   110 MBytes   924 Mbits/sec              
[  4]   1.00-2.00   sec   110 MBytes   926 Mbits/sec              
[  4]   2.00-3.00   sec   111 MBytes   934 Mbits/sec              
[  4]   3.00-4.00   sec   111 MBytes   931 Mbits/sec              
[  4]   4.00-5.00   sec   110 MBytes   926 Mbits/sec              
[  4]   5.00-6.00   sec   110 MBytes   927 Mbits/sec              
[  4]   6.00-7.00   sec   110 MBytes   921 Mbits/sec              
[  4]   7.00-8.00   sec   110 MBytes   928 Mbits/sec              
[  4]   8.00-9.00   sec   110 MBytes   924 Mbits/sec              
(now iperf is not continuing, I have to abort with ctrl+c)

What version of the product are you using? On what operating system?
recompiled from the published 3.0.tar.gz
32bit linux without ipv6 on both machines.

Please provide any additional information below.

I'll attach a strace of the client.

Original issue reported on code.google.com by intra2...@googlemail.com on 19 Nov 2013 at 3:52

Attachments:

GoogleCodeExporter commented 8 years ago
I was not able to reproduce this. Does anyone else see this behavior?

Original comment by bltier...@es.net on 26 Nov 2013 at 4:07

GoogleCodeExporter commented 8 years ago
Reproduced it quite frequently on RHEL 6.X .

Try running the steps 5/6 times . It's getting reproduced.

strace server :
write(5, 
"\275E\207\265\326\36P\202)An\225x\243\356W\304f\362\370\336\342\206~\365\260\3\
210\366}\325\263"..., 131072) = 131072
write(5, 
"\275E\207\265\326\36P\202)An\225x\243\356W\304f\362\370\336\342\206~\365\260\3\
210\366}\325\263"..., 131072) = 131072
write(5, 
"\275E\207\265\326\36P\202)An\225x\243\356W\304f\362\370\336\342\206~\365\260\3\
210\366}\325\263"..., 131072) = 131072
write(5, 
"\275E\207\265\326\36P\202)An\225x\243\356W\304f\362\370\336\342\206~\365\260\3\
210\366}\325\263"..., 131072) = 131072
write(5, 
"\275E\207\265\326\36P\202)An\225x\243\356W\304f\362\370\336\342\206~\365\260\3\
210\366}\325\263"..., 131072) = 131072
write(5, 
"\275E\207\265\326\36P\202)An\225x\243\356W\304f\362\370\336\342\206~\365\260\3\
210\366}\325\263"..., 131072 <====================Blocked

Client strace:
gettimeofday({1386164922, 754122}, NULL) = 0
select(5, [3 4], [], NULL, {0, 0})      = 1 (in [4], left {0, 0})
gettimeofday({1386164922, 754171}, NULL) = 0
select(5, [3 4], [], NULL, {0, 0})      = 1 (in [4], left {0, 0})
gettimeofday({1386164922, 754221}, NULL) = 0
select(5, [3 4], [], NULL, {0, 0})      = 1 (in [4], left {0, 0})
gettimeofday({1386164922, 754269}, NULL) = 0 
select(5, [3 4], [], NULL, {0, 0})      = 1 (in [4], left {0, 0} 
<================== Select getting TMO

  iperf_run_client:
   After receiving all the test data  
    1. The client sends TEST_DONE to the server 
    2. The Server socket is blocked on write. It never able to receive the the TEST_DONE from the control channel.
    3. The Client tries to read data and getting TMO and server blocks on write.

    it's a dead lock.

     There should be some kind of TMO in the write and receive
 Wrote a patch which fixes this .  Setting socket to some amount of TMO .

 SO_RCVTIMEO
 SO_SNDTIMEO

Original comment by susant.sahani on 4 Dec 2013 at 2:13

Attachments:

GoogleCodeExporter commented 8 years ago
I can't reproduce this either.

I don't see a TEST_DONE state anywhere in the source.  There's TEST_END and 
IPERF_DONE.  See http://code.google.com/p/iperf/wiki/IperfProtocolStates for 
details on the protocol and states.

Original comment by jef.posk...@gmail.com on 10 Dec 2013 at 2:03

GoogleCodeExporter commented 8 years ago
However, one idea to look at is if this happens on lower-speed links.  Perhaps 
there, the pipe doesn't have time to empty before the receiver closes its read 
socket.  Its certainly possible this could only happen in reverse mode.

Original comment by jef.posk...@gmail.com on 10 Dec 2013 at 2:32

GoogleCodeExporter commented 8 years ago
yes it's TEST_END . I did a typo there.

#define TEST_END 4

 This is no more reproducible because I guess it got fixed by this commit e4d782b488ed
Log message

Fixed bug where -R mode selected on a closed file.

Also added a debugging routine to dump an fd_set.
Affected files     expand all   collapse all
    Modify  /src/iperf_server_api.c diff
    Modify  /src/iperf_util.c   diff
    Modify  /src/iperf_util.h   diff

Original comment by susant.sahani on 10 Dec 2013 at 7:26

GoogleCodeExporter commented 8 years ago
Ok! Then let's close this one.

Original comment by jef.posk...@gmail.com on 10 Dec 2013 at 2:00