Open edsko opened 4 years ago
Thought: the most important difficulty with this idea is sharing: if I have, say, something like
data T = MkT { teeSet :: !(Set Int) , teeTotal :: Int }
where teeTotal
is meant to be a lazily computed total of the values in teeSet
, then the size of the thunk, if we just compute it naively, will necessarily include the size of teeSet
; after all, it points to it. Perhaps the answer here is to express this explicitly, something like
data RefersTo a b = RefersTo a b
and then
data T = MkT { teeSet :: !(Set Int) , teeTotal :: RefersTo (Set Int) Int }
then a special NoThunks
instance that computes sizes could first compute the size of a
and then not count it again when computing the size of b
(we already have this sharing detection in the heap size computation anyway, so that might work out nicely, we "just" need to make this sharing available Haskell-side).
There are two cases: first in which teeTotal = Set.size teeSet
, then the user wants to get that size of teeTotal
includes the size of teeSet
. If one tried to optimise Set
so size
is constant time, then teeTotal
doesn't contain anything that is referring to teeSet
.
In either way I think reporting the total size is the right thing to do, e.g. if one tried to do the latter and had a bug in which one actually implemented the former then nothunks
gives one a chance to discover such a bug.
See reddit discussion.