yubin00145865 / iperf

Automatically exported from code.google.com/p/iperf
Other
0 stars 0 forks source link

intervals don't work as expected #125

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
In testing intervals on lossy networks, we get some ... interesting results:

[  4]   0.00-2.13   sec   256 KBytes   984 Kbits/sec
[  4]   2.13-5.13   sec   256 KBytes   700 Kbits/sec
[  4]   5.13-7.63   sec   128 KBytes   418 Kbits/sec
[  4]   7.63-9.53   sec   128 KBytes   552 Kbits/sec
[  4]   9.53-11.03  sec   128 KBytes   699 Kbits/sec
[  4]  11.03-13.23  sec   128 KBytes   477 Kbits/sec
[  4]  13.23-19.23  sec   128 KBytes   175 Kbits/sec
[  4]  19.23-19.23  sec  0.00 Bytes  0.00 bits/sec
[  4]  19.23-19.23  sec  0.00 Bytes  0.00 bits/sec
[  4]  19.23-21.23  sec   128 KBytes   524 Kbits/sec

Email from Aaron:

> i did some debugging on this, and I think the issue may be somewhat complex to
> fix. It looks like the logic was to have the sending functions iperf_send ->
> iperf_tcp_send -> Nwrite expect that when SIGALRM goes off, the syscall will
> be interrupted. That doesn't seem to be the case for read/write/etc. Adding a
> siginterrupt(SIGARLM, 1) will quasi-fix that. It will cause the syscall to
> return -1/EINTR if nothing has been transferred. If something has been
> transferred, however, it returns the amount transferred. Unfortunately, Nwrite
> happily keeps writing thinking that it should. This situation also occurs if
> the SIGALRM happens to hit between writes (which seems not uncommon).
>
> There is a way to work around this, but it requires changing the read/write
> functions. This model gets used for this same situation in places in bwctl.
> Basically, you pass an "interrupt" pointer into the functions. This pointer
> points to an integer that, if the alarm goes off, gets incremented. Then when
> a function returns, you check if *interrupt is non-zero, and if so, treat it
> the same way as if -1/EINTR had occurred. e.g.
>
> iperf_client_api.c (note I'm passing the pointer to sigarlm_triggered down 
into iperf_send)
>                if (iperf_send(test, concurrency_model == cm_itimer ? NULL : 
&write_set, &sigalrm_triggered) < 0)
>
> iperf_tcp.c:
>   iperf_tcp_send(struct iperf_stream *sp, int *interrupt)
>       ….
>          r = Nwrite(sp->socket, sp->buffer, sp->settings->blksize, Ptcp, 
interrupt);
>
> net.c
>   int Nwrite_interruptible(int fd, const char *buf, size_t count, int prot, 
int *interrupt)
>       ….
>        r = write(fd, buf, nleft);
>         if (interrupt && *interrupt)
>             return count - nleft;
>
>  I have a very hacky patch that does the above for just this code path (i.e.
>  breaks literally every other code path), and though other things seem broken
>  for reasons I've not bothered to look into. The intervals look to be running
>  more reliably (ignore the all 0's as they're among the "other things seem
>  broken"):
>
>  [  4]   0.00-2.03   sec  0.00 Bytes  0.00 bits/sec
>  [  4]   2.03-4.03   sec  0.00 Bytes  0.00 bits/sec
>  [  4]   4.03-6.03   sec  0.00 Bytes  0.00 bits/sec

Email from Jef:

>> It looks like the logic was to have the sending functions
>> iperf_send -> iperf_tcp_send -> Nwrite expect that when SIGALRM
>> goes off, the syscall will be interrupted.
>
> My interpretation is rather that the expectation is that Nwrite will
> always complete fast enough that timer accuracy will not be noticably
> affected.  Consider that the code was originally written with only
> the select mode, no SIGALRM at all.  In select mode there's nothing
> that can interrupt a write so it always has to complete.

> The default write size is 128k, and your example is transfering
> only one or two of those blocks per interval.  Try lowering the size
> a lot and see if that evens out the interval timing.  How about: -l 8k

> So if lowering the block size does help, then how about
> adding a little one-step auto-tuner for the block size?
> Something like this, in the stats callback:
>
>    if test->sender
>       if test->settings->blksize == DEFAULT_xxx_BLKSIZE
>           n = [estimate of number of blocks per interval, at current rate]
>           if n < 20   # or so
>               printf "default block size %d is too large, auto-lowering it to 
%d", DEFAULT_xxx_BLKSIZE, LOWER_xxx_BLKSIZE
>               settings->blksize = LOWER_xxx_BLKSIZE

Email from Aaron:

> If an interval timer goes off just after the client has started to write,
> won't this still create a delayed reaction even if the duration of this delay
> is decreased?

Email from Jef:

> Sure.  I don't think the timer mechanism is intended to be super precise.
> The stats are still accurate because they go by clock time actually elapsed,
> rather than time the interval was supposed to take.

Email from Aaron:

> If there's a fair amount of loss, the send buffer could completely fill, and
> that write could hang for an indeterminate amount of time. Since the issue
> crops up when the network is poor, and that's the exact time when these
> intervals are most important, I think trying to get it as close to the
> interval is important. Beyond that, iperf and nuttcp both hit the interval on
> the nose which would make it weird if iperf3 gets it oddly off.

Original issue reported on code.google.com by AaronMat...@gmail.com on 18 Dec 2013 at 1:39

Attachments:

GoogleCodeExporter commented 8 years ago
A different possible fix. This sets the non-blocking mode on the sockets, and 
disables the sigalarm mode. The numbers come out on the nose with this, though 
I'm not sure if it breaks anything else.

Original comment by AaronMat...@gmail.com on 18 Dec 2013 at 3:26

Attachments:

GoogleCodeExporter commented 8 years ago
setnonblocking already in net.c

Original comment by susant%redhat.com@gtempaccount.com on 18 Dec 2013 at 4:05

GoogleCodeExporter commented 8 years ago
Yeah - not currently called anywhere.

Going non-blocking and getting rid of sigalrm mode is a not-obviously-bad idea. 
 In addition to looking for things it breaks, we'd have to look carefully at 
the performance.

Original comment by jef.posk...@gmail.com on 18 Dec 2013 at 4:08

GoogleCodeExporter commented 8 years ago
 Talking about the interrupted system call , the signals are handled by  signal sys call . 

 Looking at  sigaction(2) man page  can we think that setting the SA_RESTART flag is simpler that handling system call interruption. The documentation says that setting it will make certain system calls automatically restartable across signals.
more information in man 7 signal. which has list of syscall automatically 
restarted.

It would be a simpler implementation .

Original comment by susant%redhat.com@gtempaccount.com on 18 Dec 2013 at 4:11

GoogleCodeExporter commented 8 years ago
The problem is SA_RESTART flag apparently gets auto-set if you use 'signal' 
because in doing an strace, it's auto-restarting anyway. The restarting is 
actually the problem, because when a network is lossy, the write can hang for 
an indeterminate period of time which is problematic when you're trying to make 
sure you're printing out results at specific intervals.

Original comment by AaronMat...@gmail.com on 18 Dec 2013 at 4:13

GoogleCodeExporter commented 8 years ago
 the Nwrite function actually calculating how much is written and restarting the write again if it's interrupted if I am not wrong.

             switch (errno) {
                case EINTR:
                return count - nleft;

Currently sockets are not non-blocked yes write wound hang. 

Original comment by susant%redhat.com@gtempaccount.com on 18 Dec 2013 at 4:45

GoogleCodeExporter commented 8 years ago
The EINTR doesn't actually come out though because 'write' gets auto-restarted. 
Even if you do enable interrupting, there are 2 situations where it will still 
fail:

1) if any data has been written, the write returns the amount written, not 
-1/EINTR
2) if the signal occurs in-between calls to write, the EINTR code-path won't 
get called. I noticed this occur with some frequency even if i had it try to 
interrupt the calls to write. I'm not sure if this is a property of linux 
delaying the signal until the syscall is finished, or what, but it'd happen a 
number of times during every run I tried.

Original comment by AaronMat...@gmail.com on 18 Dec 2013 at 4:52

GoogleCodeExporter commented 8 years ago
looking at the manual man 7 signal

" If  a  blocked  call to one of the following interfaces is interrupted by a 
signal handler, then the call will be automatically restarted after the signal 
handler returns if
       the SA_RESTART flag was used; otherwise the call will fail with the error EINTR:"

  There is noway to specify SA_RESTART in signal system call. 

1. Correct. But need to check out the errno value  if it's interrupted .. how 
ever not sure  will debug it.

Original comment by susant%redhat.com@gtempaccount.com on 18 Dec 2013 at 5:09

GoogleCodeExporter commented 8 years ago

Original comment by bltier...@es.net on 18 Dec 2013 at 9:59

GoogleCodeExporter commented 8 years ago
Here's an updated patch that also rips out the sigalrm code. In testing 
locally, things seem to work as expected (e.g. even on very lossy connections, 
the intervals happen as expected instead of quasi-randomly). Would it be 
possible to get this tested on the 40G testbed? I'm curious what impact the 
change from sigalrm to select might have.

Original comment by AaronMat...@gmail.com on 21 Feb 2014 at 8:26

Attachments: