haskell / primitive

This package provides various primitive memory-related operations.
Other
114 stars 58 forks source link

Add atomic stuff #295

Open treeowl opened 4 years ago

treeowl commented 4 years ago

For some reason, this package doesn't offer basic access to atomic CAS and such. I see no good reason for this. The atomic-primops package has some good ideas for this, but its implementation is flaky under optimization and its author does not seem responsive to suggestions for fixing that.

An interesting (?) thought: I think we can write a somewhat-inefficient atomicModifyIORef-like function for array elements using the same sort of implementation trickery but in user-land. Very roughly speaking:

atomicModifyArray
  :: MutableArray RealWorld a
  -> Int
  -> (a -> (a, b))
  -> IO (a, b)
atomicModifyArray mary i f = do
  old_ref <- newIORef undefined
  let
    res_thunk = unsafeDupablePerformIO $
          f <$> readIORef old_ref
    {-# NOINLINE res_thunk #-}
    (new_thunk, _) = res_thunk
    {-# NOINLINE new_thunk #-}
    loop = do
      old_val <- readArray mary i
      writeIORef old_ref old_val
      succeeded <-
            -- Perform a CAS expecting old_val and
            -- attempting to install new_thunk
      if succeeded
        then do
          -- The evaluate is optional, but advisable.
          -- The alternative is to return `lazy res_thunk`
          ret <- evaluate res_thunk
          pure ret
        else loop
andrewthad commented 4 years ago

A while ago, I wrote primitive-atomic, which just has wrapper for the atomic primops. It's not as sophisticated as atomic-primops. The obvious lifted wrapper for casArray# is just:

casArray :: PrimMonad m
  => MutableArray (PrimState m) a -- ^ prim array
  -> Int -- ^ index
  -> a -- ^ expected old value
  -> a -- ^ new value
  -> m (Bool,a)
{-# INLINE casArray #-}
casArray (MutableArray arr#) (I# i#) old new =
  primitive $ \s0 -> case casArray# arr# i# old new s0 of
    (# s1, n, r #) -> (# s1, (isTrue# (n ==# 0# ),r) #)

Although, as you point out, CAS on lifted objects is useless on types that have no notion of identity. If the elements in the array were MutVar# or MutableByteArray# though, then CAS is actually sound.

lehins commented 4 years ago

CAS primop casArray# works well with boxed arrays, except for when compiled with -fhpc ghc#18289

Here is a potential implementation:

atomicModifyArray# :: MonadPrim s m => MutableArray s e -> Int -> (e -> (# e, b #)) -> m b
atomicModifyArray# ma@(MutableArray ma#) i@(I# i#) f = do
  current0 <- readArray ma i
  primitive $
    let go expected s =
          case f expected of
            (# new, artifact #) ->
              case casArray# ma# i# expected new s of
                (# s', 0#, _ #)     -> (# s', artifact #)
                (# s', _, actual #) -> go actual s'
     in go current0

So, I wouldn't say it is useless.

treeowl commented 4 years ago

CAS on lifted objects is not useless. It can, however, be subtle, as the evaluate or lazy in my sample code above hints at. The Ticket approach of atomic-primops doesn't seem terrible if it's done with a bit more (typically erased) boxing and (especially) by using lazy instead of all that unsafe coercion nonsense. Something like

data Ticket a = Ticket a

peekTicket :: Ticket a -> a
peekTicket (Ticket a) = lazy a

peekTicket# :: Ticket a -> (# a #) 
peekTicket# (Ticket a) = (# lazy a #)
treeowl commented 4 years ago

CAS primop casArray# works well with boxed arrays, except for when compiled with -fhpc ghc#18289

Here is a potential implementation:

atomicModifyArray# :: MonadPrim s m => MutableArray s e -> Int -> (e -> (# e, b #)) -> m b
atomicModifyArray# ma@(MutableArray ma#) i@(I# i#) f = do
  current0 <- readArray ma i
  primitive $
    let go expected s =
          case f expected of
            (# new, artifact #) ->
              case casArray# ma# i# expected new s of
                (# s', 0#, _ #)     -> (# s', artifact #)
                (# s', _, actual #) -> go actual s'
     in go current0

So, I wouldn't say it is useless.

Yes, that's an alternative implementation, similar to one in atomic-primops. It's more efficient than the one I gave under low contention, but under high contention it allocates a bunch of thunks that get thrown away on CAS failure.

lehins commented 4 years ago

Yeah, I don't think atomic modification is a good use case with high contention, even atomicModifyIORef' performs rather poorly when too many threads try to modify it at the same time. On the other hand with low contention atomicity guarantees are quite valuable, so it is a bit surprising that primitive does not provide those out of the box.

treeowl commented 4 years ago

@lehins, what goes wrong with atomicModifyIORef' under contention? Are there a lot of CAS misses for some reason I can't think of? Or do thunks start to build up because multiple modifications succeed before the result of the first one gets forced?

lehins commented 4 years ago

@treeowl Remember this SO question, which I think is somehow related to this and also I've done some benchmarks at some point before and I found that with increasing number of threads performance drops pretty significantly. Not sure why that is as I did not investigate it much further. However I know I will come back to this at some point in the future because I have some interesting work-in-progress for an API that amongst other things provides a unified interface for atomic modification of various mutable types.