gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
594 stars 196 forks source link

"Spam" test to stress exception handling flows #312

Open dimakuv opened 2 years ago

dimakuv commented 2 years ago

Description of the problem

We periodically find data races due to subtle bugs in exception-handling flows of Gramine, especially in the Linux-SGX PAL.

An excerpt from #311:

This bug was found while I was experimenting with something different on Gramine. That workload increased the frequency of async signals; some of these signals got delivered while in LibOS thread initialization. This manifested itself as a MEMFAULT in Gramine PAL terminology, and led to an infinite loop of such MEMFAULTs. On my extreme workload, this bug happened about 1 in 10 runs. I believe it's hard to catch this bug in normal workloads of Gramine, so no (simple) way to test this PR.

Hmm, is it feasible to implement a "signal spam" test? I.e. a workload/test running, plus an external program spamming async signals at the Graminized process (on the host). We find such issues from time to time, would be good to have a test for this.

The "signal spam" test could have an external program that constantly spams the graminized process with SIGCONTs -- they are benign and side-effect-free (more or less) but invoke the relevant exception-handling flows.

boryspoplawski commented 2 years ago

There might be two issues: 1) we lack a way of spawning processes not inside Gramine in our test suite 2) constant spam of signals might render the app unusable i.e. it would spend all of the time in signal handling - at least for Linux-SGX PAL it could be an issue - some small sleeps might solve this, but also make such a test useless; this needs some empirical testing.