Open GoogleCodeExporter opened 9 years ago
A different possible fix. This sets the non-blocking mode on the sockets, and
disables the sigalarm mode. The numbers come out on the nose with this, though
I'm not sure if it breaks anything else.
Original comment by AaronMat...@gmail.com
on 18 Dec 2013 at 3:26
Attachments:
setnonblocking already in net.c
Original comment by susant%redhat.com@gtempaccount.com
on 18 Dec 2013 at 4:05
Yeah - not currently called anywhere.
Going non-blocking and getting rid of sigalrm mode is a not-obviously-bad idea.
In addition to looking for things it breaks, we'd have to look carefully at
the performance.
Original comment by jef.posk...@gmail.com
on 18 Dec 2013 at 4:08
Talking about the interrupted system call , the signals are handled by signal sys call .
Looking at sigaction(2) man page can we think that setting the SA_RESTART flag is simpler that handling system call interruption. The documentation says that setting it will make certain system calls automatically restartable across signals.
more information in man 7 signal. which has list of syscall automatically
restarted.
It would be a simpler implementation .
Original comment by susant%redhat.com@gtempaccount.com
on 18 Dec 2013 at 4:11
The problem is SA_RESTART flag apparently gets auto-set if you use 'signal'
because in doing an strace, it's auto-restarting anyway. The restarting is
actually the problem, because when a network is lossy, the write can hang for
an indeterminate period of time which is problematic when you're trying to make
sure you're printing out results at specific intervals.
Original comment by AaronMat...@gmail.com
on 18 Dec 2013 at 4:13
the Nwrite function actually calculating how much is written and restarting the write again if it's interrupted if I am not wrong.
switch (errno) {
case EINTR:
return count - nleft;
Currently sockets are not non-blocked yes write wound hang.
Original comment by susant%redhat.com@gtempaccount.com
on 18 Dec 2013 at 4:45
The EINTR doesn't actually come out though because 'write' gets auto-restarted.
Even if you do enable interrupting, there are 2 situations where it will still
fail:
1) if any data has been written, the write returns the amount written, not
-1/EINTR
2) if the signal occurs in-between calls to write, the EINTR code-path won't
get called. I noticed this occur with some frequency even if i had it try to
interrupt the calls to write. I'm not sure if this is a property of linux
delaying the signal until the syscall is finished, or what, but it'd happen a
number of times during every run I tried.
Original comment by AaronMat...@gmail.com
on 18 Dec 2013 at 4:52
looking at the manual man 7 signal
" If a blocked call to one of the following interfaces is interrupted by a
signal handler, then the call will be automatically restarted after the signal
handler returns if
the SA_RESTART flag was used; otherwise the call will fail with the error EINTR:"
There is noway to specify SA_RESTART in signal system call.
1. Correct. But need to check out the errno value if it's interrupted .. how
ever not sure will debug it.
Original comment by susant%redhat.com@gtempaccount.com
on 18 Dec 2013 at 5:09
Original comment by bltier...@es.net
on 18 Dec 2013 at 9:59
Here's an updated patch that also rips out the sigalrm code. In testing
locally, things seem to work as expected (e.g. even on very lossy connections,
the intervals happen as expected instead of quasi-randomly). Would it be
possible to get this tested on the 40G testbed? I'm curious what impact the
change from sigalrm to select might have.
Original comment by AaronMat...@gmail.com
on 21 Feb 2014 at 8:26
Attachments:
Original issue reported on code.google.com by
AaronMat...@gmail.com
on 18 Dec 2013 at 1:39Attachments: