Open TravisWhitaker opened 4 years ago
The Atomic
values that we operate on with these C functions are in a ForeignPtr
https://github.com/tibbe/ekg-core/blob/f8d26b9a3806125694c25b9459ec0c4a0b2e87ce/Data/Atomic.hs#L22
Since IORef
is just a MutVar#
, using these C functions might even be slower than an IORef
.
I'd be happy to take a crack at moving over to IORef
, but perhaps there's something I'm missing (or I'm standing on the grave of some really tricky bug...).
IIRC I measured it and it was much faster. In particular a MutVar#
update implies allocation for the heap value it refers to.
Yikes, for some reason I thought that a MutVar's contents would be unboxed if the contained type was unboxed, but it seems that's not the case: https://gitlab.haskell.org/ghc/ghc/-/blob/master/includes/rts/storage/Closures.h#L181
MutableByteArray#
has the required atomic operations, though. So we could sprinkle __sync__synchronize()
around, or give MutableByteArray#
a spin.
@tibbe is there any interest in fixing this bug (via #42 or otherwise)? As-is, ekg is totally broken on aarch64.
CC @23Skidoo
I'll look into cutting new releases during the weekend.
I’d definitely like to get more eyes on #42 before it lands in a release. I’ve been using it since I opened the PR, but my use cases might not hit certain issues.
hs_atomic_read
andhs_atomic_write
aren't atomic at all on machines without strong memory models (total read and store ordering). To see an example of the kind of havoc this bad assumption can cause, see this (thankfully now fixed) GHC issue https://gitlab.haskell.org/ghc/ghc/-/issues/15449Here's the aarch64 code that GCC 8.2 yields for these functions: https://godbolt.org/z/giVdtx No barriers or acq/rel lda/sta emitted...
I'm curious why C functions are used for these atomic operations at all. Are these functions really that much faster than
atomicModifyIORef
?