input-output-hk / io-sim

Haskell's IO simulator which closely follows core packages (base, async, stm).
https://hackage.haskell.org/package/io-sim
Apache License 2.0
37 stars 15 forks source link

IOSimPOR fails to find a race under specific circumstances #183

Open jasagredo opened 1 week ago

jasagredo commented 1 week ago

Describe the bug This test case:

prop_race :: Property
prop_race = exploreSimTrace id action $ \_ trace ->
  case traceResult False trace of
    Left (FailureDeadlock err) -> counterexample (ppTrace trace) $ property False
    _ -> property True
  where
    action :: IOSim s ()
    action = do
      exploreRaces
      ref <- newMVar 0
      tq <- atomically newTQueue
      let f = do
            atomically $ writeTQueue tq ()
            bracketOnError
              (takeMVar ref)
              (tryPutMVar ref)
              (putMVar ref . (+1))
      t1 <- async f
      t2 <- async f
      async (cancel t1) >>= wait
      wait t2
      wait t1

will succeed. However removing the atomically $ writeTQueue tq () will make it fail because of a Deadlock. The deadlock is indeed present in the implementation: t1 takes, t1 puts, t1 receives exception, t2 takes, t1 tryPut succesfully, t2 blocks on put.

Writing to that tqueue before running the bracketOnError results in the race not being found by IOSimPOR. This looks like a direct bug.

Desktop (please complete the following information):

Additional context Add any other context about the problem here. Attach io-sim or io-sim-por trace if possible.

A trace with the deadlock if removing the mentioned line: fail.txt

coot commented 1 week ago

The attached trace contains:

      0s - Thread [].2      main - TxCommitted [] [TMVarId 2] Effect {  }

while the code snippet doesn't mentions any TMVars, is this a bug in the recent feature or have you used TMVars in some version?

coot commented 1 week ago

ok, the TMVars come from async calls.

coot commented 1 week ago

For me the property fails with a Deadlock when I comment out the writeTQueue action.

bolt12 commented 1 week ago

Yes, the problem is that the code with the writeTQueue still exhibits the race condition and IOSimPOR suddenly stops being able to find it

bolt12 commented 4 days ago

After chatting with @rjmh this might be a symptom of our implementation of MVars in terms of STM might be too deterministic and hides races.

Here's a direct quote:

Getting the race detection and reversal right between STM transactions is already pretty complex. Needing to handle MVars as primitives too would add a lot of complexity to IOSimPOR internals. If there is an implementation in terms of TVars--which has the same races--then using it will be simpler. I suppose there might be specific race reversals that needn't be explored with MVars... say a takeMVar which blocks, followed by a putMVar which unblocks it... but which the TVar implementation might classify as races anyway. If there are such cases, and they can't be fixed by rewriting the implementation-in-terms-of-TVars, then it might be worth making them atomic I suppose.

The present implementation keeps a list of blocked threads in the MVar (or rather, TVars which, when written, unblock a thread). But this means that even a takeMVar which blocks, or a putMVar which blocks, is implemented using a write to the TVar. That means that two blocking takeVars, for example, will be considered to race, because they write to the same TVar. Better would be for takeMVar to use STM retry to block the caller if the MVar is empty, thus turning blocking takeMVar calls (and similarly, blocking putMVar calls) into read-only transactions, that at least would not race with each other.

Although this situation is not ideal, i.e. IOSimPOR should still be able to find the races, this might be not a bug in IOSimPOR itself but rather a consequence of the MVar implementation