roc-lang / roc

A fast, friendly, functional language.
https://roc-lang.org
Universal Permissive License v1.0
4.48k stars 315 forks source link

Non-Escaping Allocations #1244

Open rtfeldman opened 3 years ago

rtfeldman commented 3 years ago

Some allocations in Roc functions never escape to the host. For example, let's say I run this code in an update (which would use reference counting by default, not bump allocation):

decoded =
    Decode.oneOf [ foo, bar, baz ]
        |> Decode.decodeStr str

Here, we'd be allocating this list on the heap (because it's a list) and then freeing it again almost immediately. This means there's a considerable performance cost in Roc to designing an API which takes a list like this.

This is a case where a tracing GC like V8's could outperform Roc. V8 would bump-allocate this into the nursery, and then later on the collection pass would realize it's dead and not promote it out of the bump-allocated nursery.

We could do even better than V8 here if we could make the list be bump allocated. That way the allocation cost would be the same (just a pointer bump), but we wouldn't have to pay for tracing over it later. In fact, at this point Roc would be relatively close to what people do when optimizing C code: reference count only when sharing is necessary, and arena allocate everything else. (We could still reference count these just for the sake of knowing when it's safe to mutate them in-place, but we wouldn't need to bother with free once the last reference is gone.)

We could achieve this by statically determining that this list doesn't "escape" to the host, and is therefore eligible to be allocated into a "temporary arena" that the host provides for the duration of the Roc function call. That way, all "temporary allocations" like this list could be bump allocated, and then the host could reset the bump arena at the end of the call to the Roc function.

Note: We wouldn't necessarily need to do this analysis on all builds, just during optimized builds to improve performance - so it's okay if it takes longer. But maybe it's easy enough to do it always!

Whether or not a value escapes to the host can vary by call path. For example, suppose I have a function like this:

getColors = \customColor -> [ Red, Yellow, Green, customColor ]

Inlining might save the day in a small function like this, but inlining won't always be in effect.

So how can we get both? One way to think of this is that every function's return value is polymorphic in whether it gets returned to the host. (Same goes for each field of a record, for functions that return records.) If we think of it that way, this whole analysis can be done using the same strategies we already use for type inference and monomorphization.

Regardless, this very much seems worth doing - for the API design flexibility alone. With it, there's no substantial performance reason to avoid an API design like oneOf, whereas otherwise that's a very real consideration!

rtfeldman commented 3 years ago

Here's an idea for how this could work:

I think this would work, although it might be overly conservative (that is, miss out on some opportunities to perform Temporary Allocations) specifically around unifying a value that escapes with a function argument. Unless we handle that situation differently, the unification might cause the function argument itself to be considered escapes, which doesn't really make sense.

There might be scenarios or downsides I'm not thinking of here. Also, there might be a better algorithm for doing it!