Closed bgamari closed 1 year ago
Yeah, sounds totally reasonable to me.
+1
+1 from me, and also a vote that this is a bug that needs to be fixed and backported, not a CLC api decision. "do we prevent a segfault or not" is a ghc correctness fix.
I'd argue that the implementation change is indeed a bug fix and doesn't need CLC review. The lack of proper memory ordering can cause segfaults. As for performance, the existing implementation already uses half of the store release / load acquire pair, so it's already paying some of the cost but not getting the proper benefit.
On the other hand, I think it's reasonable to ask the CLC to sign off on a change to the IORef
's officially documented memory model, i.e. the documentation change where we make certain promises about the behaviour in a multi-threaded setting.
The distinction I'm getting at is implementation vs specification. GHC changing its implementation to not segfault is just implementation details. Making promises about the memory model is strengthening the specification. The latter is a good topic for the CLC.
If the benchmarks show that we should be really worried about performance then I suggest two things:
IORef
s, and omit the memory ordering for STRef
sunsafeReadIORef
and unsafeWriteIORef
, with documentation saying that they are really really unsafe in a threaded setting in that they can cause segfaults, not merely get weird results.For the curious: the main way they can cause segfaults is like this:
IORef
IORef
and then reads the heap objectIf the memory reads and writes are not seen in the correct order then thread B may see the memory write to the IORef
but not yet see the memory writes to the heap object, so when thread B reads the heap object it gets whatever garbage was there previously (and thus hello segfault).
The fix is that the write into the IORef
in thread A uses "store release" and the read in thread B uses "load acquire", and together these ensure that the prior writes to the heap object are visible to thread B, if the write to the IORef
is also visible to thread B. The existing GHC impl has the "store release" in the writeIORef
but for some reason is missing the "load acquire" in readIORef
, but these things are specified in the C11 memory model only to work in matched pairs. So the GHC fix is to include that missing "load acquire" in the readIORef
.
@dcoutts I don't know enough about this stuff. If a thunk does some ST
computation that internally uses STRef
s, and that thunk is forced by multiple threads, can things go wrong? I'm a bit nervous about assuming that ST
is single-threaded anyway (because things like unsafeSTToIO
), but is even basic usage safe?
@Bodigrim explicitly requested that a CLC proposal be opened.
For some reason I expected changes to GHC.IORef
, but the patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9373/diffs is entirely on the compiler's side, and does not need CLC approval indeed. Sorry for confusion, my fault.
@Bodigrim but documenting the memory model is in base (haddocks), and arguably it's almost a language definition thing. Certainly in other languages (C, C++, Java etc) the memory model is part of the language spec. For us it's in libs, because our mutable variables (IORef, MVar, STM) are all exposed via libs.
So that aspect is plausibly in scope for CLC. It'd be nice for some folk to carefully review memory model documentation.
@treeowl For the first issue, I think it's fine. An ST computation creates, mutates and then forgets the STRefs. So if an ST thunk (using runST
or equivalent) is run by multiple capabilities then all those ST refs are going to be independent instances, not shared mutable vars. Otherwise ST would already be unsafe in a threaded setting simply because one would get non-deterministic results (due to the sharing and concurrency).
Yes it would be possible to cheat using unsafeSTToIO
or indeed stToIO
. You'd have to create STRefs in IO and share them between threads and then mutate them. That would provoke segfaults by mutating those STRefs
in multiple capabilities without the memory barriers.
newSTRefIO :: a -> IO (STRef RealWorld a)
newSTRefIO x = stToIO (newSTRef x)
readSTRefIO :: STRef RealWorld a -> IO a
readSTRefIO v = stToIO (readSTRef v)
writeSTRefIO :: STRef RealWorld a -> a -> IO ()
writeSTRefIO v x = stToIO (writeSTRef v x)
That might be an argument for not putting the memory barriers in the STRef
s. Though it does seem a bit crazy to say that STRef
should be safely sharable between threads when they're not supposed to be shareable between threads at all! But indeed, the types let you do it!
@dcoutts I'm not sure my concern can be disposed of quite as easily as that. The interaction between runRW#
and asynchronous exceptions is ... difficult. If thread A starts evaluating the thunk, and is then killed by an asynchronous exception, evaluation may continue in thread B. Now there may be enough barriers and such around that process to ensure everything goes as it should, but I have no idea.
The below code will only print "Starting" once.
import Control.Concurrent
import Control.Monad.ST
import Control.Monad.ST.Unsafe
import Data.STRef
import System.IO
import Control.Exception
thunk = runST $ do
unsafeIOToST (putStrLn "Starting" >> hFlush stdout)
ref <- newSTRef (10^9)
let
go = do
v <- readSTRef ref
if v > 0
then (writeSTRef ref $! v-1) >> go
else pure 3
go
data Die = Die deriving Show
instance Exception Die
main = do
tid <- forkIO (print thunk)
threadDelay 1000
throwTo tid Die
print thunk
@Bodigrim but documenting the memory model is in base (haddocks), and arguably it's almost a language definition thing. Certainly in other languages (C, C++, Java etc) the memory model is part of the language spec. For us it's in libs, because our mutable variables (IORef, MVar, STM) are all exposed via libs.
So that aspect is plausibly in scope for CLC. It'd be nice for some folk to carefully review memory model documentation.
In my opinion Haskell should expose a sequentially-consistent picture to the user. If a user has a boxed value (that is, pointer) in their hand then IMHO they shouldn't need to worry about how they came across that value (and, by extension, whether the correct barriers were used to ensure that the value is visible to them).
@Bodigrim explicitly requested that a CLC proposal be opened.
For some reason I expected changes to
GHC.IORef
, but the patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9373/diffs is entirely on the compiler's side, and does not need CLC approval indeed. Sorry for confusion, my fault.
Quite alright. I'm happy to hear that we are in agreement. Of course, I would value the CLC's opinion on what sort of memory model we should guarantee to the user (see above).
@dcoutts I'm not sure my concern can be disposed of quite as easily as that. The interaction between
runRW#
and asynchronous exceptions is ... difficult. If thread A starts evaluating the thunk, and is then killed by an asynchronous exception, evaluation may continue in thread B. Now there may be enough barriers and such around that process to ensure everything goes as it should, but I have no idea.Indeed this is a tricky case but rest assured that the correct barriers are provided by the runtime in this case. In fact, all thunk entries necessarily involve an acquire fence.
Incidentally, I would invite anyone interested in this topic to read this (draft) Note, which describes GHC's current memory ordering soundness story. Naturally, do comment on !9372 if you have any questions or concerns.
Thanks, @bgamari. I'm glad that aspect is okay.
@bgamari Could you provide a status update on the proposal? I got a bit lost while reading the thread and following PRs 🥴
My understanding:
IORef
documentation describing the current situationIORef
operations (and I believe, the patch is already merged)If my understanding is correct, then I have the following open questions:
Data.IORef
updated again to reflect changes?Data.IORef
. Do I understand correctly that my proposal becomes redundant and obsolete in the context of these changes as modifyIORef
becomes atomic?
Thanks for checking in on this, @chshersh .
To answer your questions in turn,
Data.IORef
correctly states that "The implementation is required to ensure that reordering of memory operations cannot cause type-correct code to go wrong", which is what this proposal describes.modifyIORef
is not atomic in the traditional sense of the word. I will elaborate below.This proposal addresses a soundness issue; previously accessing an IORef
from multiple threads of a concurrent program could result in memory unsafety (e.g. segmentation faults). This proposal proposes that this be fixed with the addition of appropriate memory barriers in the implementation of readIORef
.
However, usually when users refer to an "atomic modification" operation they mean an operation which guarantees that another thread's accesses cannot come between the read and write of the modification, imposing a total ordering on all of the atomic operations accessing the variable. Consider this program, for instance:
main = do
ref <- newIORef 0
done <- newEmptyMVar
forkIO $ do
atomicModifyIORef ref (+1)
putMVar done ()
atomicModifyIORef ref (+1)
takeMVar done
readIORef ref >>= print
As-written, this program is guaranteed to print the value 2
as the modifications are atomic. However, if you were to weaken these atomicModifyIORef
s to modifyIORef
then you could see either 2
or 1
from this program. The latter would arise from the two threads racing to read ref
, both reading 0
, and therefore writing 1
.
This is one of the many reasons why I tend to discourage users from relying on IORef
in concurrent settings, especially given how many better options we have in Haskell. Under the best of conditions concurrency is hard and it is that much harder with crude tools like IORef
. MVar
s and STM are considerably easier to reason about and have negligible overhead for most applications.
I am going to close this as this proposal has run its course; see https://github.com/haskell/core-libraries-committee/issues/112#issuecomment-1352318057 for the conclusion.
@bgamari Thanks for the update on the issue! All looks good to me 🙂
I'm adding the out-of-scope
label since this doesn't require a CLC proposal.
Currently,
readIORef
lacks any memory order barriers; this renders the operation unsafe to use in any multithreaded program using GHC's threaded runtime system on architectures with weakly-ordered memory models. Specifically, while this doesn't affect x86, any use of these operations can result in undefined behavior on several AArch64 implementations. Judging by the fact thatwriteIORef
has the needed write barrier,readIORef
's behavior appears to be an oversight in the original implementation. While my initial inclination was to treat this as a bug and fix this in GHC, @Bodigrim explicitly requested that a CLC proposal be opened.I would strongly argue that the status quo cannot be allowed to stand. Potential segmentation faults arising from programs which use only "safe" operations strikes a serious blow to Haskell's claim of being a "safe" language. I propose that
readIORef
carry the missing barrier and that the documentation be modified to reflect the fact that these operations are thread-safe (albeit challenging to use in a multithreaded setting).As noted above, this barrier has no effect on x86 and other total-store-ordered architectures while incurring only a small cost on more weakly-ordered architectures. For instance, on AArch64 this barrier would require in the use of
ldar
instead ofldr
. Unscientific measurements (namely, timing a tight loop of repeated loads) demonstrate that this isldar
has roughly 25% lower throughput thanldr
on Apple's M1. This added cost is going to be dwarfed by the baseline cost of the access in more realistic workload (since the benchmark here is always hitting L1 cache). In light of this, the cost of the barrier is, in my opinion, well-justified.In addition to fixing this in future releases, I would strongly advocate for aggressively backporting this fix to our stable branches (namely 9.6, 9.4, and 9.2). Afterall, the lacking read barrier is in my opinion a critical soundness bug which will become increasingly visible as weakly-ordered AArch64 implementations become more ubiquitous.
Implementation: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9373