Open jemc opened 3 years ago
Marking this with the BLOCKED
label for now because I don't want us to work on implementing this until the type checking/inference rewrite (tracked by #76) is complete.
This feature will add a lot of complexity to type checking and inference, so we want to add that complexity later after the refactor and thus spare ourselves the extra work of including it in the refactor.
What follows are my notes for an initial design proposal, which isn't totally complete or final, but contains a fair bit to chew on for the time being:
Yield blocks use a syntax of ->
to attach the yield block to a call site. Async yield blocks will use a similar syntax of ...->
, creating what is effectively a lambda as known in other languages.
The primary reason for different syntax between the two types of yield blocks isn't to aid the compiler, but rather to aid the human reader in quickly understanding that when they see a ...->
they will know that the effects inside the block will take place at some time later, whereas the ->
will continue to represent a block will execute immediately zero or more times within the call site. Here the added ...
implies "some time later".
Like a ->
block, a ...->
block can capture variables, but unlike a ->
block, it cannot rebind the variables in ways that affect the outer scope (in part because the block happens later, and in part because that would break ref cap guarantees). The captured variables of a ...->
block get quietly lifted into fields of an underlying heap-allocated runtime object which represents the block (whereas in a ->
block the code is executing in the immediate context against the same local variables on the stack).
As you may expect, a ...->
block is also not allowed to return early or raise an error into the outer context (because it is executing at a later time, when the outer context is already complete).
The target of a ...->
call will often be a behavior (rather than a synchronous method), and that behavior will usually declare :yields async
and/or have at least one yield async
statement inside of it. Alternatively it is possible to objectify the ...->
block as a lambda object that could be stored in a field and called later, but the syntax for that isn't designed yet, and it's better to avoid that pattern unless/until it is necessary (it will probably be necessary for things like Future
- see more notes later below).
With that said, let's look at an example below, which demonstrates a usage of the "access pattern" which I have previously catalogued for Pony. In Savi we will have a standard library trait called Accessible
which canonicalizes the name access
as a common convention (see more notes later below).
increase_amount U64 = 5
counter.access ...-> (counter |
counter.value += increase_amount
if (counter.value > 100) (
@counter_did_finish(counter, counter.value)
)
)
Notice also that the inner counter
yield parameter (of type Counter'ref
, as seen from inside the counter actor) shadows the outer counter
local variable (of type Counter'tag
, as seen from outside the counter actor).
Note also that the increase_amount
variable from the outer scope is captured into the block, which uses it when called later. More subtly, the @
is also captured from the outer scope as a tag
reference (this is only allowed when the @
is an actor), and the counter_did_finish
behavior is called on that actor as a kind of callback.
However, when we have many ...->
blocks nested inside one another the
increasing indentation and accumulated mental context could be troubling.
To help alleviate this, we will support an extra bit of syntax sugar in which
an "asynchronous tail call" (where a ...->
call is made at the end of an outer ...->
block) may have its syntax abbreviated, such that instead of seeing ...-> (
to begin a new block, you will see ...-> |
to begin a new pipe-delimited section in the existing block.
There is also a potential performance benefit to using the tail call pattern for an asynchronous sequence of events, because the heap-allocation needed for capturing lambda objects can be amortized to allocate once for the entire chain of events, with each step in the sequence of events corresponding to a different method of the same underlying runtime object.
Consider the following example of tail calls, which also demonstrates how conditional break
s can be used even when tail calls to ...->
are in play.
transaction = Transaction.new(bob_account, alice_account, 500)
clearance_service.check_if_allowed(transaction) ...-> (
| is_allowed |
break unless is_allowed
transaction.debit_account.access ...->
| debit_account |
if !debit_account.already_processed(transaction) (
if (debit_account.balance >= transaction.amount) (
debit_account.balance -= transaction.amount
|
notifications.failed_transaction(transaction, "insufficient funds")
break
)
debit_account.mark_as_processed(transaction)
)
transaction.credit_account.access ...->
| credit_account |
if !credit_account.already_processed(transaction) (
credit_account.balance += transaction.amount
credit_account.mark_as_processed(transaction)
notifications.completed_transaction(transaction)
)
)
Note that this is similar in ergonomics (though not in implementation details) to the JavaScript syntax for await
inside of an async
function. The initial/outermost ...-> (
block corresponds to the async
function, and the inner ...-> |
steps correspond to the await
keyword, as demonstrated with the above example rewritten in semantically equivalent JavaScript:
const transaction = new Transaction(bob_account, alice_account, 500)
clearance_service.check_if_allowed(transaction, async (is_allowed) => {
if (!is_allowed) return
const debit_account = await transaction.debit_account.access()
if (!debit_account.already_processed(transaction)) {
if (debit_account.balance >= transaction.amount) {
debit_account.balance -= transaction.amount
} else {
notifications.failed_transaction(transaction, "insufficient funds")
return
)
debit_account.mark_as_processed(transaction)
}
const credit_account = await transaction.credit_account.access()
if (!credit_account.already_processed(transaction)) {
credit_account.balance += transaction.amount
credit_account.mark_as_processed(transaction)
notifications.completed_transaction(transaction)
}
})
It's worth noting here as an aside that await
semantics have been avoided in Pony because we never want an actor to block in the middle of a behavior, waiting for some certain input and unable to respond to other messages. Such a situation is strongly discouraged by the language as much as possible.
However, we successfully avoid that problem in the above design because it isn't the actor which is waiting for fulfillment, but rather the async runtime object itself (created through the ...->
syntax) that gets passed in messages from one actor to the next.
Regarding the access
method, we will have a trait which is defined like this:
:trait Accessible(A)
:be access
:yields async A
For actors which wish to easily opt in to this trait, they can use the following declaration:
:auto Accessible
Which will generate the following code, allowing any caller to send them an async yield block which will be granted temporary access to the:
:is Accessible(@'ref)
:be access
:yields async @'ref
yield @
However, Accessible
need not always be used with the @'ref
type. For example, for the Future
type, it might be used to access the inner future value (of type A
) like this:
:actor Future(A)
:is Accessible(A)
:be access
:yields async A into(callback FnOnce'iso)
if @is_ready (
try (--callback).call_once(@value.as!(A))
|
@waiting_callbacks << (--callback)
)
Notice that in this case the async yield block is being reified into a local variable called callback
. This is akin to a "block parameter" in Ruby or Crystal. It may have any name chosen by the code author.
Reifying the async yield block into a variable is necessary because if the future instance doesn't have a value ready yet, it must save the callback into a list of callbacks to call later when the value becomes ready.
In this case it declares that it objectifies the async yield block as a FunOnce'iso
(more on this trait to be explained later), which acts as a guarantee that it cannot be called more than once, allowing the type system to use that guarantee on the caller side to allow the captured variables in the block to be mutable, and allowing the compiler to make certain performance optimizations based on the knowledge that it will only be called at most once.
Now let's talk about the different kinds of blocks/lambdas that seem possible and useful, what restrictions they have, and how they will be implemented under the hood.
After that we can talk about how they fit into the type system and language together.
Call count:
Inter-scope interaction:
Ref cap restrictions:
Mobility:
Theoretical "as if" modeling:
Internal Implementation / Optimization:
Call count:
Inter-scope interaction:
iso
variables in the inner scope, provided that they aren't used any more in the outer scope (and thus the consume does not affect the outer scope)Ref cap restrictions:
Mobility:
iso
with :fun iso
so as to be single-useTheoretical "as if" modeling:
iso
):new iso
constructor:fun iso
that immediately lowers itself to ref
, so that the captured variable fields may be seen through the un-colored ref
lens.ref
is not recovered to an iso
, so the function is not callable againInternal Implementation / Optimization:
Call count:
Inter-scope interaction:
iso
variables in the inner scopeRef cap restrictions:
Mobility:
Theoretical "as if" modeling:
iso
):new iso
constructor:fun ref
with an auto-recovered receiver, meaning that nothing non-sendable may pass into or out of the block function (via its arguments or return value aka the yielded values or yield result value).Internal Implementation / Optimization:
Call count:
Inter-scope interaction:
iso
variables in the inner scopeRef cap restrictions:
Mobility:
Theoretical "as if" modeling:
ref
):new ref
constructor:fun ref
.Internal Implementation / Optimization:
Call count:
Inter-scope interaction:
iso
variables in the inner scopeRef cap restrictions:
Mobility:
Theoretical "as if" modeling:
val
):new val
constructor:fun box
.Internal Implementation / Optimization:
Call count:
Inter-scope interaction:
iso
variables in the inner scopeRef cap restrictions:
box
lens - no mutation of them is possibleMobility:
Theoretical "as if" modeling:
box
):new box
constructor:fun box
.Internal Implementation / Optimization:
Call count:
Inter-scope interaction:
iso
variables in the inner scopeRef cap restrictions:
Mobility:
Theoretical "as if" modeling:
:module
(with ref cap: non
):fun non
:module
type can adhere to function traits of any receiver cap because there are no fields to track - so it is possible for example to adhere to a trait with a :fun iso
, :fun ref
, or :fun val
even though the function declared on the module is a :fun non
.Internal Implementation / Optimization:
Now let's try to translate those "Kinds" into traits that fit into the type system:
// `->` (Kind A)
// (cannot be objectified, so does not need to have a corresponding trait)
// `...->` FunOnce'iso (Kind B)
:trait iso FunOnce(R, P1, P2)
:fun iso call_once(P1, P2) R
// `...->` Fun'iso (Kind C)
// `...->` Fun'ref (Kind D)
:trait ref Fun(R, P1, P2)
:fun ref call(p1 P1, p2 P2) R
// Fun'iso can be a subtype of FunOnce'iso thanks to this wrapper method:
:fun iso call_once(p1 P1, p2 P2): @call(p1, p2)
// `...->` FunRead'val (Kind E)
// `...->` FunRead'box (Kind F)
:trait box FunRead(R, P1, P2)
:fun box call(P1, P2) R
// FunRead'iso can be a subtype of FunOnce'iso thanks to this wrapper method:
:fun iso call_once(p1 P1, p2 P2): @call(p1, p2)
// `...->` FunRead'val (Kind G)
// (note that in Savi, `non` types can adhere to a `val` trait/fun,
// so it's okay to put stateless functions on `non` types here as `FunRead'val` too)
Notice that these traits (FunRead <: Fun <: FunOnce
, respectively) mostly mirror the hierarchy of function traits in Rust (Fn <: FnMut <: FnOnce
), except that we give naming primacy to the mutable trait (giving it no suffix) in order to encourage its use as the default where possible, to give the lambda supplier more flexibility. See the later decision tree for more info on this.
We mostly make up for deficiency of FunRead
not being a subtype of Fun
by relying on type anonymity to allow a FunRead
-compatible ...->
block literal to be objectified as one of the other trait types as long as it happens immediately at the creation site. That is, we can conceptually treat the immutable block as being a sub-object inside a mutable one, as long as we know this at the definition site so that we can define the anonymous type differently; and because the type is anonymous, it is not instantiable in any other place, so no other code can notice the discrepancy in how it was defined.
Here are the rules on how we know which kinds of block can be objectified as which trait:
A ...->
block adhering to Kind G or Kind E rules can be objectified as a:
FunRead'val
(default)FunRead'box
(normal subtyping: val
is a subtype of box
)Fun'ref
(relying on type anonymity to make this safe)Fun'iso
(relying on type anonymity to make this safe)FunOnce'iso
(relying on type anonymity to make this safe)A ...->
block adhering to Kind F rules can be objectified as a:
FunRead'box
(default)FunRead'val
if there are non-shareable captures)Fun'ref
(relying on type anonymity to make this safe)Fun'iso
if there are non-sendable captures)FunOnce'iso
if there are non-sendable captures)A ...->
block adhering to Kind D rules can be objectified as a:
Fun'ref
(default)FunRead'box
(relying on type anonymity to make this safe)FunRead'val
if there are non-shareable captures)FunRead'val
if there is rebinding of captures)Fun'iso
if there are non-sendable captures)Fun'iso
if there are non-sendable parameters)Fun'iso
if there is a non-sendable return)FunOnce'iso
if there are non-sendable captures)A ...->
block adhering to Kind C rules can be objectified as a:
Fun'iso
(default)Fun'ref
(normal subtyping: iso
is a subtype of ref
)FunOnce'iso
(normal subtyping: Fun'iso
is a subtype of FunOnce'iso
)FunRead'val
if there are non-shareable captures)FunRead'val
if there is rebinding of captures)FunRead'val
if there is mutation of captures)FunRead'box
if there is rebinding of captures)FunRead'box
if there is mutation of captures)A ...->
block adhering to Kind B rules can be objectified as a:
FunOnce'iso
(default)FunRead'val
if there are non-shareable captures)FunRead'val
if there is consuming of captures)FunRead'val
if there is rebinding of captures)FunRead'val
if there is mutation of captures)FunRead'box
if there is rebinding of captures)FunRead'box
if there is mutation of captures)Fun'ref
if there is consuming of captures)Fun'iso
if there is consuming of captures)Fun'iso
if there are non-sendable parameters)Fun'iso
if there is a non-sendable return)A ->
block must adhere to Kind A rules and cannot be objectified at all.
So now, knowing those type relationships, we can start to develop some best practices which will help people navigate which types of blocks/lambdas to use in their program architectures:
In general when designing code that accepts a block/lambda for use, you want to give maximum freedom to the calling code that is providing the block/lambda while still meeting your own needs for how you will use it.
With that in mind, here is a initial decision tree to use as an API design convention:
"I will only call the lambda immediately within the function that receives it"
yield
and let the compiler infer the optimal underlying type based on yield sites, and let it have the best performance optimizations"I can guarantee that I will only call this function once at most, and I want the lambda supplier to be able to use that guarantee to consume its captures (at the expense of requiring all captures to be sendable)"
FunOnce
(a.k.a FunOnce'iso
)"This lambda will never need to be taken across an actor/region boundary (i.e. it doesn't need to be sendable)"
"The lambda will never need to be called from a read-only context (i.e. retrieving it from a field via a box
viewpoint in order to call it)"
Fun
(a.k.a Fun'ref
)FunRead
(a.k.a FunRead'box
)"All of the arguments I need to pass into this lambda are sendable and the result I get out from it can also be sendable"
Fun'iso
FunRead'val
Or, as a more brief summary:
yield
when there is no need to turn the lambda into an objectFunOnce
when there is no need to call it multiple timesFun
or FunRead
when there is no need to be sendable, preferring the former where the type system allows itFun'iso
or FunRead'val
, preferring the former where the type system allows it@jasoncarr0 requested that I convert my proposal notes here into a pull request of documents for easier review and threading of review comments.
Jason and anyone else are welcome to review this in #131 and any changes I make in that PR will get later reflected in this ticket here when I edit the comments above to add that information.
Updated my comments here based on the review in #131, so I'm going to close #131 and we can finish the discussion here.
This looks pretty good.
We mostly make up for deficiency of FunRead not being a subtype of Fun by relying on type anonymity to allow a FunRead-compatible ...-> block literal to be objectified as one of the other trait types as long as it happens immediately at the creation site. That is, we can conceptually treat the immutable block as being a sub-object inside a mutable one, as long as we know this at the definition site so that we can define the anonymous type differently; and because the type is anonymous, it is not instantiable in any other place, so no other code can notice the discrepancy in how it was defined.
This may not be necessary, realistically. If you infer the strongest possible capability, and with the change from FunVal to FunRead, then you're fine, e.g. if you only read data, you can get a FunRead'ref
which is in fact a subtype of Fun
now (and if all your data was sendable, then you can get FunRead'iso
which is again a subtype). This is only necessary if you don't infer the capability correctly, or if you are coercing an existing fun type.
We have yield blocks (
->
) in Savi which do a good job at being both performant and ergonomic for use cases where the yield block is used immediately from within the calling function.However, they do not satisfy two very important non-immediate use cases that we need to support:
This ticket is here to track the design and implementation of this feature.