input-output-hk / nothunks

Apache License 2.0
47 stars 13 forks source link

Combine with heap size computation in order to prioritize thunks #4

Open edsko opened 4 years ago

edsko commented 4 years ago

See reddit discussion.

edsko commented 4 years ago

Thought: the most important difficulty with this idea is sharing: if I have, say, something like

data T = MkT { teeSet :: !(Set Int) , teeTotal :: Int }

where teeTotal is meant to be a lazily computed total of the values in teeSet, then the size of the thunk, if we just compute it naively, will necessarily include the size of teeSet; after all, it points to it. Perhaps the answer here is to express this explicitly, something like

data RefersTo a b = RefersTo a b

and then

data T = MkT { teeSet :: !(Set Int) , teeTotal :: RefersTo (Set Int) Int }

then a special NoThunks instance that computes sizes could first compute the size of a and then not count it again when computing the size of b (we already have this sharing detection in the heap size computation anyway, so that might work out nicely, we "just" need to make this sharing available Haskell-side).

coot commented 9 months ago

There are two cases: first in which teeTotal = Set.size teeSet, then the user wants to get that size of teeTotal includes the size of teeSet. If one tried to optimise Set so size is constant time, then teeTotal doesn't contain anything that is referring to teeSet.

In either way I think reporting the total size is the right thing to do, e.g. if one tried to do the latter and had a bug in which one actually implemented the former then nothunks gives one a chance to discover such a bug.