haskell / core-libraries-committee

95 stars 16 forks source link

Ensure that `IORef` operations are concurrency-safe #112

Closed bgamari closed 1 year ago

bgamari commented 1 year ago

Currently, readIORef lacks any memory order barriers; this renders the operation unsafe to use in any multithreaded program using GHC's threaded runtime system on architectures with weakly-ordered memory models. Specifically, while this doesn't affect x86, any use of these operations can result in undefined behavior on several AArch64 implementations. Judging by the fact that writeIORef has the needed write barrier, readIORef's behavior appears to be an oversight in the original implementation. While my initial inclination was to treat this as a bug and fix this in GHC, @Bodigrim explicitly requested that a CLC proposal be opened.

I would strongly argue that the status quo cannot be allowed to stand. Potential segmentation faults arising from programs which use only "safe" operations strikes a serious blow to Haskell's claim of being a "safe" language. I propose that readIORef carry the missing barrier and that the documentation be modified to reflect the fact that these operations are thread-safe (albeit challenging to use in a multithreaded setting).

As noted above, this barrier has no effect on x86 and other total-store-ordered architectures while incurring only a small cost on more weakly-ordered architectures. For instance, on AArch64 this barrier would require in the use of ldar instead of ldr. Unscientific measurements (namely, timing a tight loop of repeated loads) demonstrate that this is ldar has roughly 25% lower throughput than ldr on Apple's M1. This added cost is going to be dwarfed by the baseline cost of the access in more realistic workload (since the benchmark here is always hitting L1 cache). In light of this, the cost of the barrier is, in my opinion, well-justified.

In addition to fixing this in future releases, I would strongly advocate for aggressively backporting this fix to our stable branches (namely 9.6, 9.4, and 9.2). Afterall, the lacking read barrier is in my opinion a critical soundness bug which will become increasingly visible as weakly-ordered AArch64 implementations become more ubiquitous.

Implementation: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9373

chessai commented 1 year ago

Yeah, sounds totally reasonable to me.

+1

gbaz commented 1 year ago

+1 from me, and also a vote that this is a bug that needs to be fixed and backported, not a CLC api decision. "do we prevent a segfault or not" is a ghc correctness fix.

dcoutts commented 1 year ago

I'd argue that the implementation change is indeed a bug fix and doesn't need CLC review. The lack of proper memory ordering can cause segfaults. As for performance, the existing implementation already uses half of the store release / load acquire pair, so it's already paying some of the cost but not getting the proper benefit.

On the other hand, I think it's reasonable to ask the CLC to sign off on a change to the IORef's officially documented memory model, i.e. the documentation change where we make certain promises about the behaviour in a multi-threaded setting.

The distinction I'm getting at is implementation vs specification. GHC changing its implementation to not segfault is just implementation details. Making promises about the memory model is strengthening the specification. The latter is a good topic for the CLC.

If the benchmarks show that we should be really worried about performance then I suggest two things:

  1. separate the implementation of STRefs from IORefs, and omit the memory ordering for STRefs
  2. provide unsafeReadIORef and unsafeWriteIORef, with documentation saying that they are really really unsafe in a threaded setting in that they can cause segfaults, not merely get weird results.

For the curious: the main way they can cause segfaults is like this:

  1. Thread A allocates and fills in some heap object
  2. Thread A writes this heap object pointer into an IORef
  3. Thread B reads from the IORef and then reads the heap object

If the memory reads and writes are not seen in the correct order then thread B may see the memory write to the IORef but not yet see the memory writes to the heap object, so when thread B reads the heap object it gets whatever garbage was there previously (and thus hello segfault).

The fix is that the write into the IORef in thread A uses "store release" and the read in thread B uses "load acquire", and together these ensure that the prior writes to the heap object are visible to thread B, if the write to the IORef is also visible to thread B. The existing GHC impl has the "store release" in the writeIORef but for some reason is missing the "load acquire" in readIORef, but these things are specified in the C11 memory model only to work in matched pairs. So the GHC fix is to include that missing "load acquire" in the readIORef.

treeowl commented 1 year ago

@dcoutts I don't know enough about this stuff. If a thunk does some ST computation that internally uses STRefs, and that thunk is forced by multiple threads, can things go wrong? I'm a bit nervous about assuming that ST is single-threaded anyway (because things like unsafeSTToIO), but is even basic usage safe?

Bodigrim commented 1 year ago

@Bodigrim explicitly requested that a CLC proposal be opened.

For some reason I expected changes to GHC.IORef, but the patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9373/diffs is entirely on the compiler's side, and does not need CLC approval indeed. Sorry for confusion, my fault.

dcoutts commented 1 year ago

@Bodigrim but documenting the memory model is in base (haddocks), and arguably it's almost a language definition thing. Certainly in other languages (C, C++, Java etc) the memory model is part of the language spec. For us it's in libs, because our mutable variables (IORef, MVar, STM) are all exposed via libs.

So that aspect is plausibly in scope for CLC. It'd be nice for some folk to carefully review memory model documentation.

dcoutts commented 1 year ago

@treeowl For the first issue, I think it's fine. An ST computation creates, mutates and then forgets the STRefs. So if an ST thunk (using runST or equivalent) is run by multiple capabilities then all those ST refs are going to be independent instances, not shared mutable vars. Otherwise ST would already be unsafe in a threaded setting simply because one would get non-deterministic results (due to the sharing and concurrency).

Yes it would be possible to cheat using unsafeSTToIO or indeed stToIO. You'd have to create STRefs in IO and share them between threads and then mutate them. That would provoke segfaults by mutating those STRefs in multiple capabilities without the memory barriers.

newSTRefIO :: a -> IO (STRef RealWorld a)
newSTRefIO x = stToIO (newSTRef x)

readSTRefIO :: STRef RealWorld a -> IO a
readSTRefIO v = stToIO (readSTRef v)

writeSTRefIO :: STRef RealWorld a -> a -> IO ()
writeSTRefIO v x = stToIO (writeSTRef v x)

That might be an argument for not putting the memory barriers in the STRefs. Though it does seem a bit crazy to say that STRef should be safely sharable between threads when they're not supposed to be shareable between threads at all! But indeed, the types let you do it!

treeowl commented 1 year ago

@dcoutts I'm not sure my concern can be disposed of quite as easily as that. The interaction between runRW# and asynchronous exceptions is ... difficult. If thread A starts evaluating the thunk, and is then killed by an asynchronous exception, evaluation may continue in thread B. Now there may be enough barriers and such around that process to ensure everything goes as it should, but I have no idea.

The below code will only print "Starting" once.

import Control.Concurrent
import Control.Monad.ST
import Control.Monad.ST.Unsafe
import Data.STRef
import System.IO
import Control.Exception

thunk = runST $ do
  unsafeIOToST (putStrLn "Starting" >> hFlush stdout)
  ref <- newSTRef (10^9)
  let
    go = do
      v <- readSTRef ref
      if v > 0
        then (writeSTRef ref $! v-1) >> go
        else pure 3
  go

data Die = Die deriving Show
instance Exception Die

main = do
  tid <- forkIO (print thunk)
  threadDelay 1000
  throwTo tid Die
  print thunk
bgamari commented 1 year ago

@Bodigrim but documenting the memory model is in base (haddocks), and arguably it's almost a language definition thing. Certainly in other languages (C, C++, Java etc) the memory model is part of the language spec. For us it's in libs, because our mutable variables (IORef, MVar, STM) are all exposed via libs.

So that aspect is plausibly in scope for CLC. It'd be nice for some folk to carefully review memory model documentation.

In my opinion Haskell should expose a sequentially-consistent picture to the user. If a user has a boxed value (that is, pointer) in their hand then IMHO they shouldn't need to worry about how they came across that value (and, by extension, whether the correct barriers were used to ensure that the value is visible to them).

@Bodigrim explicitly requested that a CLC proposal be opened.

For some reason I expected changes to GHC.IORef, but the patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9373/diffs is entirely on the compiler's side, and does not need CLC approval indeed. Sorry for confusion, my fault.

Quite alright. I'm happy to hear that we are in agreement. Of course, I would value the CLC's opinion on what sort of memory model we should guarantee to the user (see above).

bgamari commented 1 year ago

@dcoutts I'm not sure my concern can be disposed of quite as easily as that. The interaction between runRW# and asynchronous exceptions is ... difficult. If thread A starts evaluating the thunk, and is then killed by an asynchronous exception, evaluation may continue in thread B. Now there may be enough barriers and such around that process to ensure everything goes as it should, but I have no idea.

Indeed this is a tricky case but rest assured that the correct barriers are provided by the runtime in this case. In fact, all thunk entries necessarily involve an acquire fence.

Incidentally, I would invite anyone interested in this topic to read this (draft) Note, which describes GHC's current memory ordering soundness story. Naturally, do comment on !9372 if you have any questions or concerns.

treeowl commented 1 year ago

Thanks, @bgamari. I'm glad that aspect is okay.

chshersh commented 1 year ago

@bgamari Could you provide a status update on the proposal? I got a bit lost while reading the thread and following PRs 🥴

My understanding:

If my understanding is correct, then I have the following open questions:

  1. Was the documentation in Data.IORef updated again to reflect changes?
  2. I believe, the CLC proposal process expects CLC members to vote first and only then the implementation can be merged. But it's already was merged, it feels weird to vote on it. How should we proceed?
  3. I have a separate CLC proposal open about introducing some new functions to Data.IORef. Do I understand correctly that my proposal becomes redundant and obsolete in the context of these changes as modifyIORef becomes atomic?
bgamari commented 1 year ago

Thanks for checking in on this, @chshersh .

To answer your questions in turn,

  1. The documentation of Data.IORef correctly states that "The implementation is required to ensure that reordering of memory operations cannot cause type-correct code to go wrong", which is what this proposal describes.
  2. As noted in https://github.com/haskell/core-libraries-committee/issues/112#issuecomment-1352318057, this proposal arose out of a mutual misunderstanding; the change in question is merely fixing a compiler bug and I believe we all agree it shouldn't require a CLC proposal
  3. Even after this proposal modifyIORef is not atomic in the traditional sense of the word. I will elaborate below.

This proposal addresses a soundness issue; previously accessing an IORef from multiple threads of a concurrent program could result in memory unsafety (e.g. segmentation faults). This proposal proposes that this be fixed with the addition of appropriate memory barriers in the implementation of readIORef.

However, usually when users refer to an "atomic modification" operation they mean an operation which guarantees that another thread's accesses cannot come between the read and write of the modification, imposing a total ordering on all of the atomic operations accessing the variable. Consider this program, for instance:

main = do
    ref <- newIORef 0
    done <- newEmptyMVar

    forkIO $ do
        atomicModifyIORef  ref (+1)
        putMVar done ()

    atomicModifyIORef ref (+1)

    takeMVar done
    readIORef ref >>= print

As-written, this program is guaranteed to print the value 2 as the modifications are atomic. However, if you were to weaken these atomicModifyIORefs to modifyIORef then you could see either 2 or 1 from this program. The latter would arise from the two threads racing to read ref, both reading 0, and therefore writing 1.

This is one of the many reasons why I tend to discourage users from relying on IORef in concurrent settings, especially given how many better options we have in Haskell. Under the best of conditions concurrency is hard and it is that much harder with crude tools like IORef. MVars and STM are considerably easier to reason about and have negligible overhead for most applications.

bgamari commented 1 year ago

I am going to close this as this proposal has run its course; see https://github.com/haskell/core-libraries-committee/issues/112#issuecomment-1352318057 for the conclusion.

chshersh commented 1 year ago

@bgamari Thanks for the update on the issue! All looks good to me 🙂 I'm adding the out-of-scope label since this doesn't require a CLC proposal.