Open GoogleCodeExporter opened 9 years ago
This happened on OSX, but Linux seems OK.
Original comment by bltier...@es.net
on 20 Dec 2013 at 11:20
This seems to reliably reproduce the problem on linux:
#!/bin/sh
set -x
while [ 1 ]
do
./src/iperf3 -P 2 -c localhost -t 5
./src/iperf3 -P 2 -c localhost -t 5 -R
done
It works for 3-6 loops, and then locks up. (1 time the server crashed).
Hopefully that will help track it down.
Original comment by bltier...@es.net
on 22 Dec 2013 at 3:09
Running the server in gdb shows that the server is crashing on this line:
Program received signal SIGSEGV, Segmentation fault.
0x000000305784812c in vfprintf () from /lib64/libc.so.6
Which is called from here:
1808 iprintf(test, report_sum_bw_retrans_format, start_time, end_time,
ubuf, nbuf, retransmits, irp->omitted?report_omitted:"");
Maybe Sasant's new patch will fix this?
Original comment by bltier...@es.net
on 24 Dec 2013 at 4:15
I am too able to reproduce this . The reverse -R option server getting crashed
getsockopt(5, SOL_TCP, TCP_INFO, "\1\0\0\0\0\7w\0(\21\3\0@\234\0\0\270\377\0\0\30\2\0\0\0\0\0\0\0\0\0\0"..., [104]) = 0 getsockopt(7, SOL_TCP, TCP_INFO, "\1\0\0\0\0\7w\0(\21\3\0@\234\0\0\270\377\0\0\30\2\0\0\0\0\0\0\0\0\0\0"..., [104]) = 0 write(1, "- - - - - - - - - - - - - - - - "..., 50- - - - - - - - - - - - - - -
) = 50
write(1, "[ 5] 8.02-9.00 sec 382 MB"..., 67[ 5] 8.02-9.00 sec 382
MBytes 3.27 Gbits/sec 5
) = 67
write(1, "[ 7] 8.02-9.00 sec 381 MB"..., 67[ 7] 8.02-9.00 sec 381
MBytes 3.26 Gbits/sec 0
) = 67
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x5} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
(gdb) bt
#0 0x000000399144908f in vfprintf () from /lib64/libc.so.6
#1 0x000000000040542a in vprintf (__arg=0x7fffffffda08,
__fmt=0x4110e0 <report_sum_bw_retrans_format> "\340SUM] %6.2f-%-6.2f sec %ss %ss/sec", ' ' <repeats 14 times>, "%s\n") at /usr/include/bits/stdio.h:38
#2 iprintf (test=test@entry=0x617010, format=0x4110e0
<report_sum_bw_retrans_format> "\340SUM] %6.2f-%-6.2f sec %ss %ss/sec", ' '
<repeats 14 times>, "%s\n")
at iperf_api.c:2405
#3 0x000000000040618b in iperf_print_intermediate (test=test@entry=0x617010)
at iperf_api.c:1808
#4 0x0000000000406468 in iperf_reporter_callback (test=0x617010) at
iperf_api.c:2008
#5 0x000000000040c9ac in tmr_run (nowP=nowP@entry=0x7fffffffdd10) at
timer.c:189
#6 0x0000000000409f43 in iperf_run_server (test=test@entry=0x617010) at
iperf_server_api.c:586
#7 0x0000000000401e92 in run (test=0x617010) at main.c:116
#8 main (argc=<optimized out>, argv=0x7fffffffdf68) at main.c:91
gdb) f 0
#0 0x000000399144908f in vfprintf () from /lib64/libc.so.6
(gdb) list
43 __STDIO_INLINE int
44 getchar (void)
45 {
46 return _IO_getc (stdin);
47 }
48
49
50 # ifdef __USE_MISC
51 /* Faster version when locking is not necessary. */
52 __STDIO_INLINE int
Looks like the stack is getting corrupted somewhere which is leading to crash
Need to dig more what is really causing the crash
Original comment by susant%redhat.com@gtempaccount.com
on 24 Dec 2013 at 5:26
I've been doing some digging into this. The hang and the crash *might* have
two different causes, or might be two different manifestations of the same
problem. Notes from a private email on this subject, where I was describing
what I saw with FreeBSD 10.0 and -R. There's a hang but no crash.
-----
A slightly lower level symptom of this problem is that at the end of the
test, the client tries to send an TEST_END state change message to the
server over the control connection. When in -R mode, the server doesn't
seem to get it or read it reliably. However if I kill the client
(because it seems hung) the server immediately gets the TEST_END and
tries to do the end-of-test processing (it can't do this successfully
because at this point the client has died and closed its side of the
control connection).
In non -R mode this part all works as expected (I see the client send
the TEST_END and the server receives it immediately, as we would expect).
This is all on FreeBSD 10.0, client and server on the same machine (so
far it looks like the configuration where client and server are on the
same machine is particularly vulnerable to this problem).
Original comment by bmah@es.net
on 24 Dec 2013 at 7:12
Partial fix committed in c499d0008f7d. There was basically a deadlock between
the client and server in -R mode, see commit log for more details.
Not closing this yet...need to do some more tests to get a warm fuzzy feeling
about the fix first. Also note that this doesn't address the server-side
crashes that have been reported (but which I have not personally witnessed).
Original comment by bmah@es.net
on 3 Jan 2014 at 6:09
Fixed the -P and -R server-side crash reported via Comments 2, 3, and 4, in
423166a54849. This only affected Linux; it was a mangled printf format string
that only got used on that platform (it would have been used on any other
platform with retransmit statistics, but there aren't currently any).
It's clear to me now that there were multiple issues being reported in this one
bug. :-p
Original comment by bmah@es.net
on 3 Jan 2014 at 6:38
If gcc isn't spitting out warnings on format strings as const char variables,
it'd probably make sense to turn the format strings into typedefs or something
to ensure that gcc spits out a warning if this kind of mismatch happens.
Original comment by AaronMat...@gmail.com
on 3 Jan 2014 at 6:43
Good point. I don't see any warning messages for the format string mismatch
(on a working copy rolled back to before my fix), but gcc isn't compiling with
any warnings enabled either, as far as I can tell:
gcc -DHAVE_CONFIG_H -I. -g -O2 -MT iperf_api.o -MD -MP -MF
.deps/iperf_api.Tpo -c -o iperf_api.o iperf_api.c
I'm not sure why this is...I'm used to living under -Wall and -Werror. Yet
another thing to investigate.
Original comment by bmah@es.net
on 3 Jan 2014 at 7:04
Update: Just one sub-issue remaining from this bug report...that's the hang
with -Z. I've been able to observe this on Mac OS, as reported in the initial
bug report. It doesn't happen every time, at least not on my MacBook;
sometimes the -Z test works just fine.
So far I have not been able to reproduce this problem on my other two
development platforms (FreeBSD 10 and CentOS 6).
It's not clear to me if there's something platform-specific lurking about or
not, although the sendfile(2) call used by the -Z option is slightly different
on the three platforms I've been using (therefore there are slightly different
codepaths being used).
Original comment by bmah@es.net
on 3 Jan 2014 at 10:52
In my tests, OSX hangs every time. Linux is now working fine.
Original comment by bltier...@gmail.com
on 4 Jan 2014 at 3:21
Update: I'm still seeing this issue (but not consistently) on MacOS 10.8 and
MacOS 10.9.
Original comment by bmah@es.net
on 21 Jan 2014 at 9:08
Original issue reported on code.google.com by
bltier...@es.net
on 20 Dec 2013 at 10:51