Open treeowl opened 4 years ago
What are the semantics of noDuplicate#
? I've never used it before.
@chessai , it's about lazy blackholing. Usually, it's okay to occasionally evaluate the same thunk twice in different threads, as long as it doesn't take too long. Sometimes, we might want to avoid that. There's always a stack of thunks currently under evaluation. When GHC hits noDuplicate#
, it walks that stack and blackholes all the thunks, ensuring only one thread is working on them. noDuplicate#
is precisely the difference between unsafePerformIO
and unsafeDupablePerformIO
. noDuplicate#
is not free, so when might it be worthwhile? That can be a bit hard to guess in general. Suppose you evaluate fmap f arr
, for non-huge arr
, in two threads at nearly the same time, and both of them actually perform the evaluation (i.e., neither had to GC in the middle). Often, this is fine. But if f
is expensive, then that's bad—each thread will produce an f e
thunk for each e
, which (I believe) will never be de-duplicated.
Here's my rough guess:
noDuplicate#
should likely use it.noDuplicate#
.
runArray
and similar should probably usenoDuplicate#
to avoid duplicated work and loss of sharing.<>
should probably usenoDuplicate#
when the result is large, for some value of large.