savi-lang / savi

A fast language for programmers who are passionate about their craft.
BSD 3-Clause "New" or "Revised" License
153 stars 12 forks source link

Add asynchronous yield blocks / objectifiable lambdas to the language. #130

Open jemc opened 2 years ago

jemc commented 2 years ago

We have yield blocks (->) in Savi which do a good job at being both performant and ergonomic for use cases where the yield block is used immediately from within the calling function.

However, they do not satisfy two very important non-immediate use cases that we need to support:

This ticket is here to track the design and implementation of this feature.

jemc commented 2 years ago

Marking this with the BLOCKED label for now because I don't want us to work on implementing this until the type checking/inference rewrite (tracked by #76) is complete.

This feature will add a lot of complexity to type checking and inference, so we want to add that complexity later after the refactor and thus spare ourselves the extra work of including it in the refactor.

jemc commented 2 years ago

What follows are my notes for an initial design proposal, which isn't totally complete or final, but contains a fair bit to chew on for the time being:

jemc commented 2 years ago

Yield blocks use a syntax of -> to attach the yield block to a call site. Async yield blocks will use a similar syntax of ...->, creating what is effectively a lambda as known in other languages.

The primary reason for different syntax between the two types of yield blocks isn't to aid the compiler, but rather to aid the human reader in quickly understanding that when they see a ...-> they will know that the effects inside the block will take place at some time later, whereas the -> will continue to represent a block will execute immediately zero or more times within the call site. Here the added ... implies "some time later".

Like a -> block, a ...-> block can capture variables, but unlike a -> block, it cannot rebind the variables in ways that affect the outer scope (in part because the block happens later, and in part because that would break ref cap guarantees). The captured variables of a ...-> block get quietly lifted into fields of an underlying heap-allocated runtime object which represents the block (whereas in a -> block the code is executing in the immediate context against the same local variables on the stack).

As you may expect, a ...-> block is also not allowed to return early or raise an error into the outer context (because it is executing at a later time, when the outer context is already complete).

The target of a ...-> call will often be a behavior (rather than a synchronous method), and that behavior will usually declare :yields async and/or have at least one yield async statement inside of it. Alternatively it is possible to objectify the ...-> block as a lambda object that could be stored in a field and called later, but the syntax for that isn't designed yet, and it's better to avoid that pattern unless/until it is necessary (it will probably be necessary for things like Future - see more notes later below).

With that said, let's look at an example below, which demonstrates a usage of the "access pattern" which I have previously catalogued for Pony. In Savi we will have a standard library trait called Accessible which canonicalizes the name access as a common convention (see more notes later below).

increase_amount U64 = 5

counter.access ...-> (counter |
  counter.value += increase_amount
  if (counter.value > 100) (
    @counter_did_finish(counter, counter.value)
  )
)

Notice also that the inner counter yield parameter (of type Counter'ref, as seen from inside the counter actor) shadows the outer counter local variable (of type Counter'tag, as seen from outside the counter actor).

Note also that the increase_amount variable from the outer scope is captured into the block, which uses it when called later. More subtly, the @ is also captured from the outer scope as a tag reference (this is only allowed when the @ is an actor), and the counter_did_finish behavior is called on that actor as a kind of callback.

jemc commented 2 years ago

However, when we have many ...-> blocks nested inside one another the increasing indentation and accumulated mental context could be troubling.

To help alleviate this, we will support an extra bit of syntax sugar in which an "asynchronous tail call" (where a ...-> call is made at the end of an outer ...-> block) may have its syntax abbreviated, such that instead of seeing ...-> ( to begin a new block, you will see ...-> | to begin a new pipe-delimited section in the existing block.

There is also a potential performance benefit to using the tail call pattern for an asynchronous sequence of events, because the heap-allocation needed for capturing lambda objects can be amortized to allocate once for the entire chain of events, with each step in the sequence of events corresponding to a different method of the same underlying runtime object.

Consider the following example of tail calls, which also demonstrates how conditional breaks can be used even when tail calls to ...-> are in play.

transaction = Transaction.new(bob_account, alice_account, 500)

clearance_service.check_if_allowed(transaction) ...-> (
| is_allowed |
  break unless is_allowed
  transaction.debit_account.access ...->

| debit_account |
  if !debit_account.already_processed(transaction) (
    if (debit_account.balance >= transaction.amount) (
      debit_account.balance -= transaction.amount
    |
      notifications.failed_transaction(transaction, "insufficient funds")
      break
    )
    debit_account.mark_as_processed(transaction)
  )
  transaction.credit_account.access ...->

| credit_account |
  if !credit_account.already_processed(transaction) (
    credit_account.balance += transaction.amount
    credit_account.mark_as_processed(transaction)
    notifications.completed_transaction(transaction)
  )
)

Note that this is similar in ergonomics (though not in implementation details) to the JavaScript syntax for await inside of an async function. The initial/outermost ...-> ( block corresponds to the async function, and the inner ...-> | steps correspond to the await keyword, as demonstrated with the above example rewritten in semantically equivalent JavaScript:

const transaction = new Transaction(bob_account, alice_account, 500)

clearance_service.check_if_allowed(transaction, async (is_allowed) => {
  if (!is_allowed) return

  const debit_account = await transaction.debit_account.access()
  if (!debit_account.already_processed(transaction)) {
    if (debit_account.balance >= transaction.amount) {
      debit_account.balance -= transaction.amount
    } else {
      notifications.failed_transaction(transaction, "insufficient funds")
      return
    )
    debit_account.mark_as_processed(transaction)
  }

  const credit_account = await transaction.credit_account.access()
  if (!credit_account.already_processed(transaction)) {
    credit_account.balance += transaction.amount
    credit_account.mark_as_processed(transaction)
    notifications.completed_transaction(transaction)
  }
})

It's worth noting here as an aside that await semantics have been avoided in Pony because we never want an actor to block in the middle of a behavior, waiting for some certain input and unable to respond to other messages. Such a situation is strongly discouraged by the language as much as possible.

However, we successfully avoid that problem in the above design because it isn't the actor which is waiting for fulfillment, but rather the async runtime object itself (created through the ...-> syntax) that gets passed in messages from one actor to the next.

jemc commented 2 years ago

Regarding the access method, we will have a trait which is defined like this:

:trait Accessible(A)
  :be access
    :yields async A

For actors which wish to easily opt in to this trait, they can use the following declaration:

  :auto Accessible

Which will generate the following code, allowing any caller to send them an async yield block which will be granted temporary access to the:

  :is Accessible(@'ref)

  :be access
    :yields async @'ref
    yield @

However, Accessible need not always be used with the @'ref type. For example, for the Future type, it might be used to access the inner future value (of type A) like this:

:actor Future(A)
  :is Accessible(A)

  :be access
    :yields async A into(callback FnOnce'iso)
    if @is_ready (
      try (--callback).call_once(@value.as!(A))
    |
      @waiting_callbacks << (--callback)
    )

Notice that in this case the async yield block is being reified into a local variable called callback. This is akin to a "block parameter" in Ruby or Crystal. It may have any name chosen by the code author.

Reifying the async yield block into a variable is necessary because if the future instance doesn't have a value ready yet, it must save the callback into a list of callbacks to call later when the value becomes ready.

In this case it declares that it objectifies the async yield block as a FunOnce'iso (more on this trait to be explained later), which acts as a guarantee that it cannot be called more than once, allowing the type system to use that guarantee on the caller side to allow the captured variables in the block to be mutable, and allowing the compiler to make certain performance optimizations based on the knowledge that it will only be called at most once.

jemc commented 2 years ago

Now let's talk about the different kinds of blocks/lambdas that seem possible and useful, what restrictions they have, and how they will be implemented under the hood.

After that we can talk about how they fit into the type system and language together.

Kind A (immediate yield block)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

Kind B (single-use isolated lambda)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

Kind C (multi-use isolated lambda)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

Kind D (mutable lambda)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

Kind E (immutable lambda)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

Kind F (read-only lambda)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

Kind G ("pure" lambda)

Call count:

Inter-scope interaction:

Ref cap restrictions:

Mobility:

Theoretical "as if" modeling:

Internal Implementation / Optimization:

jemc commented 2 years ago

Now let's try to translate those "Kinds" into traits that fit into the type system:

// `->` (Kind A)
// (cannot be objectified, so does not need to have a corresponding trait)

// `...->` FunOnce'iso (Kind B)
:trait iso FunOnce(R, P1, P2)
  :fun iso call_once(P1, P2) R

// `...->` Fun'iso (Kind C)
// `...->` Fun'ref (Kind D)
:trait ref Fun(R, P1, P2)
  :fun ref call(p1 P1, p2 P2) R

  // Fun'iso can be a subtype of FunOnce'iso thanks to this wrapper method:
  :fun iso call_once(p1 P1, p2 P2): @call(p1, p2)

// `...->` FunRead'val (Kind E)
// `...->` FunRead'box (Kind F)
:trait box FunRead(R, P1, P2)
  :fun box call(P1, P2) R

  // FunRead'iso can be a subtype of FunOnce'iso thanks to this wrapper method:
  :fun iso call_once(p1 P1, p2 P2): @call(p1, p2)

// `...->` FunRead'val (Kind G)
// (note that in Savi, `non` types can adhere to a `val` trait/fun,
// so it's okay to put stateless functions on `non` types here as `FunRead'val` too)

Notice that these traits (FunRead <: Fun <: FunOnce, respectively) mostly mirror the hierarchy of function traits in Rust (Fn <: FnMut <: FnOnce), except that we give naming primacy to the mutable trait (giving it no suffix) in order to encourage its use as the default where possible, to give the lambda supplier more flexibility. See the later decision tree for more info on this.

We mostly make up for deficiency of FunRead not being a subtype of Fun by relying on type anonymity to allow a FunRead-compatible ...-> block literal to be objectified as one of the other trait types as long as it happens immediately at the creation site. That is, we can conceptually treat the immutable block as being a sub-object inside a mutable one, as long as we know this at the definition site so that we can define the anonymous type differently; and because the type is anonymous, it is not instantiable in any other place, so no other code can notice the discrepancy in how it was defined.

Here are the rules on how we know which kinds of block can be objectified as which trait:

A ...-> block adhering to Kind G or Kind E rules can be objectified as a:

A ...-> block adhering to Kind F rules can be objectified as a:

A ...-> block adhering to Kind D rules can be objectified as a:

A ...-> block adhering to Kind C rules can be objectified as a:

A ...-> block adhering to Kind B rules can be objectified as a:

A -> block must adhere to Kind A rules and cannot be objectified at all.

jemc commented 2 years ago

So now, knowing those type relationships, we can start to develop some best practices which will help people navigate which types of blocks/lambdas to use in their program architectures:

In general when designing code that accepts a block/lambda for use, you want to give maximum freedom to the calling code that is providing the block/lambda while still meeting your own needs for how you will use it.

With that in mind, here is a initial decision tree to use as an API design convention:

  1. "I will only call the lambda immediately within the function that receives it"

    • if true: use yield and let the compiler infer the optimal underlying type based on yield sites, and let it have the best performance optimizations
    • if false: (continue to step 2)
  2. "I can guarantee that I will only call this function once at most, and I want the lambda supplier to be able to use that guarantee to consume its captures (at the expense of requiring all captures to be sendable)"

    • if true: use FunOnce (a.k.a FunOnce'iso)
    • if false: (continue to step 3)
  3. "This lambda will never need to be taken across an actor/region boundary (i.e. it doesn't need to be sendable)"

    • if true: (continue to step 4)
    • if false: (continue to step 5)
  4. "The lambda will never need to be called from a read-only context (i.e. retrieving it from a field via a box viewpoint in order to call it)"

    • if true: use Fun (a.k.a Fun'ref)
    • if false: use FunRead (a.k.a FunRead'box)
  5. "All of the arguments I need to pass into this lambda are sendable and the result I get out from it can also be sendable"

    • if true: use Fun'iso
    • if false: use FunRead'val

Or, as a more brief summary:

jemc commented 2 years ago

@jasoncarr0 requested that I convert my proposal notes here into a pull request of documents for easier review and threading of review comments.

Jason and anyone else are welcome to review this in #131 and any changes I make in that PR will get later reflected in this ticket here when I edit the comments above to add that information.

See https://github.com/savi-lang/savi/pull/131

jemc commented 2 years ago

Updated my comments here based on the review in #131, so I'm going to close #131 and we can finish the discussion here.

jasoncarr0 commented 2 years ago

This looks pretty good.

We mostly make up for deficiency of FunRead not being a subtype of Fun by relying on type anonymity to allow a FunRead-compatible ...-> block literal to be objectified as one of the other trait types as long as it happens immediately at the creation site. That is, we can conceptually treat the immutable block as being a sub-object inside a mutable one, as long as we know this at the definition site so that we can define the anonymous type differently; and because the type is anonymous, it is not instantiable in any other place, so no other code can notice the discrepancy in how it was defined.

This may not be necessary, realistically. If you infer the strongest possible capability, and with the change from FunVal to FunRead, then you're fine, e.g. if you only read data, you can get a FunRead'ref which is in fact a subtype of Fun now (and if all your data was sendable, then you can get FunRead'iso which is again a subtype). This is only necessary if you don't infer the capability correctly, or if you are coercing an existing fun type.