rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.09k stars 580 forks source link

alarm, async_segv, async_usr1, and ignored_async_usr1 tests intermittently fail with "FAILED: token '...' not in output" #383

Closed joneschrisg closed 11 years ago

joneschrisg commented 11 years ago

Jeez, I guess I've just been assuming I've filed all this stuff. Blech.

Example failure

10: Test 'async_segv' FAILED: token 'caught segv, goodbye' not in output: 10: -------------------------------------------------- 10: -------------------------------------------------- 10: Test async_segv failed, leaving behind /tmp/rr-test-async_segv-d11uFSSVT

Not really sure what might be up here.

joneschrisg commented 11 years ago

Caught one from alarm that looks like

6: Test 'alarm' FAILED: token 'Signal' not in output:
6: --------------------------------------------------
6: ...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................--------------------------------------------------
6: Test alarm failed, leaving behind /tmp/rr-test-alarm-Yzz4rehNP

Seems related.

joneschrisg commented 11 years ago

For ignored_async_usr1, during recording this was seen

38: Successfully delivered signal 10 to 'ignored_async_usr1-c7ygfbdUb'
38: .[WARN] (rec_sched_deregister_thread: errno: None) 14626 still has pending events.  From top down:
38: [INFO] (log_event) SIGNAL_DELIVERY: SIGIO

and we get the failure mode

38: Test 'ignored_async_usr1' FAILED: token 'EXIT-SUCCESS' not in output:
38: --------------------------------------------------
38: --------------------------------------------------
joneschrisg commented 11 years ago

Caught one from alarm that looks like

This warning seen for that failure

6: .[WARN] (rec_sched_deregister_thread: errno: None) 11895 still has pending events.  From top down:
6: [INFO] (log_event) SIGNAL_DELIVERY: SIGIO
6: [INFO] (log_event) SIGNAL_HANDLER: SIGALRM

It seems that we're getting a time-slice interrupt (?) on top of another signal delivery/handler, but we're misunderstanding the time-slice interrupt to be a real signal. Or something.

joneschrisg commented 11 years ago

misunderstanding the time-slice interrupt to be a real signal

And since the default disposition of SIGIO is TERM, that would kill the tracees early and explain the symptom here.

joneschrisg commented 11 years ago

Hmm, one of these I caught had the failure mode

10: Output from recording:
10: --------------------------------------------------
10: --------------------------------------------------
10: Output from replay:
10: --------------------------------------------------
10: caught segv, goodbye
10: --------------------------------------------------

Sigh. Might be some pipe-y thing.

joneschrisg commented 11 years ago

Sigh. Might be some pipe-y thing.

Note to self: this symptom might be caused by replaying the write-to-stdout at syscall entry, but the syscall never completing during replay because the tracee is terminated by a signal (erroneously; the apparent "real bug" here).