haskell / primitive

This package provides various primitive memory-related operations.
Other
114 stars 58 forks source link

Use noDuplicate# where appropriate #294

Open treeowl opened 4 years ago

treeowl commented 4 years ago

runArray and similar should probably use noDuplicate# to avoid duplicated work and loss of sharing. <> should probably use noDuplicate# when the result is large, for some value of large.

chessai commented 3 years ago

What are the semantics of noDuplicate#? I've never used it before.

treeowl commented 3 years ago

@chessai , it's about lazy blackholing. Usually, it's okay to occasionally evaluate the same thunk twice in different threads, as long as it doesn't take too long. Sometimes, we might want to avoid that. There's always a stack of thunks currently under evaluation. When GHC hits noDuplicate#, it walks that stack and blackholes all the thunks, ensuring only one thread is working on them. noDuplicate# is precisely the difference between unsafePerformIO and unsafeDupablePerformIO. noDuplicate# is not free, so when might it be worthwhile? That can be a bit hard to guess in general. Suppose you evaluate fmap f arr, for non-huge arr, in two threads at nearly the same time, and both of them actually perform the evaluation (i.e., neither had to GC in the middle). Often, this is fine. But if f is expensive, then that's bad—each thread will produce an f e thunk for each e, which (I believe) will never be de-duplicated.

Here's my rough guess:

  1. Operations that are quite expensive compared to noDuplicate# should likely use it.
  2. Operations that can produce a substantial loss of sharing in case of duplication should perhaps have versions that call noDuplicate#.