flock test fails on lustrefs

Keno commented 7 years ago

> $HOME/rr-build/bin/rr record $HOME/rr-build/bin/flock
rr: Saving execution to trace directory `/global/homes/k/kfischer/.local/share/rr/flock-7'.
parent pid is 61556
sizeof(flock) = 32
before lock: type: 2, pid: 0
  after lock: type: 0, pid: 61556
sizeof(flock64) = 32
before lock: type: 2, pid: 0
  after GETLK: type: 1, pid: 61556
P: forcing child to block on LK, sleeping ...
FAILED: errno=4 (Interrupted system call)
flock: /global/homes/k/kfischer/rr/src/test/flock.c:98: main: Assertion `"FAILED: !" && check_cond(0 == err)' failed.
P: ... awake, releasing lock
FAILED: errno=0 (Success)
flock: /global/homes/k/kfischer/rr/src/test/flock.c:120: main: Assertion `"FAILED: !" && check_cond(((((__extension__ (((union { __typeof(status) __in; int __i; }) { .__in = (status) }).__i))) & 0x7f) == 0) && 0 == ((((__extension__ (((union { __typeof(status) __in; int __i; }) { .__in = (status) }).__i))) & 0xff00) >> 8))' failed.
Aborted
> stat --file-system --format=%T .
lustre

Works fine without rr as well as with -n. Mount options are rw,flock,lazystatfs in case it makes a difference.

rocallahan commented 7 years ago

Blocked by Lustre.

OK. Blocking signal delivery indefinitely when the signal wasn't blocked by user-space seems like a bug to me. If it's infeasible to fix, or just not worth fixing, then OK. Obviously for rr it's not important since we only care about the ptrace case.

I may keep investigating in the interests of fixing the ptrace/Lustre interaction, but I certainly don't have to do it here.

Feel free to do it here :-). Generally I'm keen to get upstream bugs reported and fixed. I just meant this is not an issue where we need to struggle mightily to reach a consensus, if we're having difficulty reaching one.

paf-49 commented 7 years ago

OK. Blocking signal delivery indefinitely when the signal wasn't blocked by user-space seems like a bug to me. If it's infeasible to fix, or just not worth fixing, then OK.

Well, Oleg (who's one of the core Lustre developers, I'm on the periphery) might have an opinion here, but basically, Lustre must wait, it shouldn't do it uninterruptibly (bad form, since some of our network timeouts are in minutes, and people should be able to kill programs that are in those waits), but it doesn't want to stop waiting for signals that aren't intended to be fatal. So they came up with a mask of signals that they think people will consider critical/fatal. The sort of signals people expect to interrupt and stop a program. And they block everything else. But... SIGKILL aside, that's inherently arbitrary.

It's ugly, but I don't see a better solution.

That's kind of neither here nor there, though. FWIW, when you have SIG_IGN set and are not ptracing, this code hangs on non-Lustre filesystems. The SIG_IGN seems to mean we don't get the expected state transition that the parent is waiting for, so it never unlocks. When ptracing, SIG_IGN is, well, ignored.

rocallahan commented 7 years ago

FWIW, when you have SIG_IGN set and are not ptracing, this code hangs on non-Lustre filesystems.

I don't see this. Testcase: https://gist.github.com/rocallahan/6b557e6f0a63bf00b586f58187af8927 Completes normally for me with btrfs.

Lustre must wait, it shouldn't do it uninterruptibly (bad form, since some of our network timeouts are in minutes, and people should be able to kill programs that are in those waits), but it doesn't want to stop waiting for signals that aren't intended to be fatal.

I believe the correct behavior here would be for signals to interrupt the system call, leaving it in a restartable state, so that if the signal is ignored or there is a signal handler which returns, the system call restarts and can complete normally. If you strace -f my testcase then you should see that's exactly what happens:

[pid 13994] fcntl(3, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=4096} <unfinished ...>
[pid 13993] <... nanosleep resumed> NULL) = 0
[pid 13993] kill(13994, SIGPWR)         = 0
[pid 13994] <... fcntl resumed> )       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 13994] --- SIGPWR {si_signo=SIGPWR, si_code=SI_USER, si_pid=13993, si_uid=1000} ---
[pid 13993] nanosleep({0, 500000000},  <unfinished ...>
[pid 13994] fcntl(3, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=4096} <unfinished ...>

I understand that might be difficult to implement. Though, for flock it seems like all you'd have to do is abandon taking the lock and prepare the syscall for restart, so the restarted syscall just tries again from scratch to take the lock.

rr-debugger / rr

flock test fails on lustrefs #1908