iu-parfunc / gibbon

A compiler for functional programs on serialized data
http://iu-parfunc.github.io/gibbon/
157 stars 13 forks source link

Arenas for non-packed data #118

Open vollmerm opened 5 years ago

vollmerm commented 5 years ago

We need a strategy for memory management of non-packed data (currently just dictionaries). One suggestion was to make the programmer deal with this by manually introducing arenas and adding an arena variable to operations that allocate non-packed data.

One complication here is that dictionaries may point to packed values, so they must participate at least somewhat in the current reference counting strategy. It might be sufficient for an arena to keep track of all regions it refers to, and on de-allocation decrement all their ref counts.

rrnewton commented 5 years ago

@vollmerm - I think you weren't on the Zoom when we went into the post-meeting discussion of this (me @ckoparkar and @mikerainey). We considered a few additional options that maybe we could discuss further here.

Option 1: first class linear arenas

If we want the user to explicitly manipulate arena's (as suggested at the top of this issue), it could look something like this:

withArena :: (forall s . Arena s -o a) -o a 
withChildArena :: Arena s1 -o (forall s2 . Arena s2 -o a) -o (Arena s1, a)
newHashMap :: Arena s -o (Arena s, Dict s k a)

Here an arena is basically just a cursor into a buffer, and linearity ensures each location is only written once (and when there's parallelism, only one thread can allocate to the region).

withChildArena is an optimization to use the arena like a stack. You push new stuff to the same buffer, and then pop it off which gives you back the same allocation pointer you started with. In fact, for many of our passes, it seems like a fully stack-alloc'd environment map would be a perfect fit.

WarninG: There's a delicate global invariant for the rank-2 types above to actually prevent escape of any value allocated in the arena. If we had existential packages in the language, then in fact you could smuggle out a Dict s2 _ _ past the forall s2.

rrnewton commented 5 years ago

Option 2: changing the "default" alloc pointer

@samth suggested that we could simply have all non-packed data go into a temp region during a certain dynamic extent:

withTempArena :: Packed b => (a -> b) -> a -> b

Here, the fact that b is a packed type guarantees that it can't create any pointers into the dynamically allocated non-packed (dictionary) data created during the execution of the function. (Which is actually an unfortunate restriction.)

This is similar to the trick that UrWeb uses automatically, and in fact we could apply the principle to all expressions with packed types.

rrnewton commented 5 years ago

Note: other memory-related issues include:

rrnewton commented 5 years ago

I think before we do something hacky, we need to convince ourselves that there isn't a more elegant way to do things that makes first class linear regions the same as other regions, instead of adding something new called "arenas":

Option 3: Explicit first class destination regions, interop with inferred regions

[TODO]

vollmerm commented 5 years ago

Forgot to tag this commit: 0c48d8864246bd54f101cd2ca075c4bc5d083362

vollmerm commented 5 years ago

Copied from slack discussion last week (@rrnewton)

I propose that there is in fact a gradual path from explicit-arenas-only-for-dictionaries, and some kind of grand unified explicit/implicit region story. I think supporting data constructors directly (not just fixed dictionary primops) would be step 1, but still requiring that their fields be pointers. Then that restriction in turn could be whittled away… Then eventually the language-interop story between inferred-land and explicit-land could be sorted out… Somewhere along the way the physical representations would need be unified in the runtime. We have explicit pointers (not tagged indirections) in Gibbon too, right? So there’s no reason that the representations inside the Arena’s can’t already be valid Gibbon binary data….

vollmerm commented 5 years ago

Forgot to tag this commit too: c64ae3b22172372639e43b9105a2933af04b648d

vollmerm commented 5 years ago

Arenas are nearly implemented now. Main thing left is tying them into the run-time system (which involves fixing some long-standing issues with our C code).

While the compiler treats arenas and (packed) regions separately, we could and maybe should use the phrase "region" to refer to both. Also, as discussed in slack, one other use for our arenas might be to let the programmer manually annotate certain data constructor calls with them, essentially taking control of how that one thing is allocated in memory.

Overall design so far: