Syntax sugar for lambda with immediate partial application of arguments.

jemc commented 9 years ago

Many other languages and frameworks use anonymous function literals or lambdas to specify a function to be applied when an asynchronous operation completes. In the recent implementation of "promises" in Pony, a similar approach is used to supply a function to be applied when a promise is fulfilled or rejected.

In practice, it is usually useful to send some bit of state with the function in order to facilitate an appropriate response action. Some languages allow lambdas to "close" on variables in their surrounding lexical scope and essentially carry around a reference to the stack frame, so that local variables in that scope can be used in the response. Pony lambdas do not do this, but a similar effect can be achieved with partial application of arguments to a function. This is particularly useful when all applied arguments are sendable, as the function is most often sent to another actor for asynchronous execution. The term/readline package, which uses promises, demonstrates an example of this, though it does so by using out-of-line defined tag functions instead of in-line lambda literals.

However, there are many reasons why using in-line lambda literals may be preferable in certain cases, including visual localization of logic and a tidier namespace of defined functions in the surrounding class/actor/primitive. This is also possible in Pony, and works quite well for one-off "reaction" functions as shown in the following real-world example from a test suite for a package I'm working on:

class SocketTest is UnitTest
  let _env: Env
  new iso create(env: Env) => _env = env
  fun name(): String => "zmq.Socket"

  fun apply(h: TestHelper): TestResult =>
    let ra = _SocketReactor; let a = zmq.Socket(zmq.PAIR, ra.notify())
    let rb = _SocketReactor; let b = zmq.Socket(zmq.PAIR, rb.notify())

    a.bind("tcp://localhost:8899")
    b.connect("tcp://localhost:8899")
    a.send_string("foo")
    b.send_string("bar")

    ra.next(recover lambda(h: TestHelper, a: zmq.Socket, m: zmq.Message) =>
      h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
      a.dispose()
    end~apply(h,a) end)

    rb.next(recover lambda(h: TestHelper, b: zmq.Socket, m: zmq.Message) =>
      h.expect_eq[zmq.Message](m, recover zmq.Message.push("bar") end)
      b.dispose()
    end~apply(h,b) end)

    ra.when_closed(recover lambda(h: TestHelper, rb: _SocketReactor) =>
      rb.when_closed(recover lambda(h: TestHelper) =>
        h.complete(true)
      end~apply(h) end)
    end~apply(h,rb) end)

    LongTest

As mentioned above, this works well, but has a few aspects of its syntax that are less-than-ideal:

The variables for immediate partial application have to be specified both in the arguments to ~apply and in the parameters of the lambda literal. For long lambdas, these two locations become even more delocalized from one another.
The types of the variables for immediate partial application are already known to the compiler from the surrounding scope, but the syntax requires them to be written again in the parameter signature of the lambda literal.
The intent is to create a lambda of a specific "arity" that also has access to specific captures, but this intent is obscured and not obvious because one must mentally "subtract" the partial application arity from the parameter arity. For example, in each of the calls to next in the example snippet, the intent is to create and pass a lambda that accepts a single zmq.Message parameter and also has access to the h and a variables as immediate captures, but this intent is obscured by the three-parameter signature. To put it another way, the parameter list contains two very different kinds of entities - parameters and immediate captures - and this distinction is not obvious without also studying the partial application at the end of the lambda literal.

I would argue that as the Pony user base and available libraries expand, this usage pattern has the potential to become quite useful and popular, but without additional syntax sugar it is more cumbersome than it needs to be. In addition to syntactical benefits, the implementation of a lambda created with immediate captures has the potential to be more execution-efficient than the alternative of having a lambda literal, then applying some partial arguments as a second step (this is only speculation, as I'm not intimately familiar with the implementation of the current lambda sugar yet).

To that end, I propose that syntax sugar be added that allows for immediate partial application of local variables to be specified at the head of the lambda literal, and without having to re-specify them (or their types) in the parameter list. Within the body of lambda literal, the named variables would be available with the same names and types that they had in the body of the enclosing function. Other than those specifications, I'm not fixated on the particular details of the syntax, though I've prepared an example to get the conversation started, and I'm interested to hear other ideas and suggestions. Here is the above example, rewritten with one conception of how this syntax sugar could look:

class SocketTest is UnitTest
  let _env: Env
  new iso create(env: Env) => _env = env
  fun name(): String => "zmq.Socket"

  fun apply(h: TestHelper): TestResult =>
    let ra = _SocketReactor; let a = zmq.Socket(zmq.PAIR, ra.notify())
    let rb = _SocketReactor; let b = zmq.Socket(zmq.PAIR, rb.notify())

    a.bind("tcp://localhost:8899")
    b.connect("tcp://localhost:8899")
    a.send_string("foo")
    b.send_string("bar")

    ra.next(recover lambda~(h,a)(m: zmq.Message) =>
      h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
      a.dispose()
    end end)

    rb.next(recover lambda~(h,b)(m: zmq.Message) =>
      h.expect_eq[zmq.Message](m, recover zmq.Message.push("bar") end)
      b.dispose()
    end end)

    ra.when_closed(recover lambda~(h,rb)() =>
      rb.when_closed(recover lambda~(h)() =>
        h.complete(true)
      end end)
    end end)

    LongTest

Specifically, this proposal would amend lambda syntax sugar to accept an optional set of immediate captures preceding the parameter signature, marked by the ~ character to denote its distinction from the parameter signature syntax. To me, it makes conceptual sense that the immediate captures precede the parameter signature because in the partial application method they become the parameters that precede the "remaining" parameters.

As mentioned above, this is only one idea for how this might look, and I'm interested to hear other ideas for the details of the syntax and feedback from the maintainers about whether this is find of syntax desirable as proposed. I hope you'll agree that it's helpful; I suspect such a feature would have a significant impact on time and effort required to create and maintain a robust, comprehensive set of tests for this socket library. Also, I'm willing to put forth the development effort to implement the sugar myself if that makes a difference - I do have some experience working with other parsers, ASTs and compilers.

andymcn commented 9 years ago

Pony lambdas are a special case of object literals, intended as a more compact form for when you just need code, not state. It you also need state variables then an object literal can be used.

Using your example with object literals:

class SocketTest is UnitTest
  let _env: Env
  new iso create(env: Env) => _env = env
  fun name(): String => "zmq.Socket"

  fun apply(h: TestHelper): TestResult =>
    let ra = _SocketReactor; let a = zmq.Socket(zmq.PAIR, ra.notify())
    let rb = _SocketReactor; let b = zmq.Socket(zmq.PAIR, rb.notify())

    a.bind("tcp://localhost:8899")
    b.connect("tcp://localhost:8899")
    a.send_string("foo")
    b.send_string("bar")

    ra.next(object
      var _h: TestHelper = h
      var _a: zmq.Socket = a
      fun apply(m: zmq.Message) =>
        _h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
        _a.dispose()
      end)

    rb.next(object
      var _h: TestHelper = h
      var _b: zmq.Socket
      fun apply(m: zmq.Message) =>
        _h.expect_eq[zmq.Message](m, recover zmq.Message.push("bar") end)
        _b.dispose()
      end)

    ra.when_closed(object
      var _h: TestHelper = h
      var _rb: _SocketReactor = rb
      fun apply() =>
        _rb.when_closed(object
          var _h': TestHelper = _h
          fun apply() =>
            _h'.complete(true)
        end)
    end

    LongTest

Does this solve the problem for you?

I'm not sure exactly how your API works, so I don't know if the nested object literals is the best way to handle the when_closed case.

jemc commented 9 years ago

@andymcn - Maybe you misunderstood. I'm not having a problem finding code that works - I'm trying to lobby for (and offer to implement) extra syntax sugar that makes the first code example I gave (which already works) as succinct and DRY as the second code example I gave. The example you gave looks like it should work (I didn't try it yet, but it probably needs a recover around the object literals, though), but it is actually less succinct and more cumbersome than the first example I gave.

You're right that a lambda by itself does not carry any state, but when you partially apply some arguments, it's my understanding from the Pony tutorial that it becomes the equivalent of an object literal just like the ones you've typed explicitly in your example.

I think this is a very useful case to streamline the syntax for, and trivial enough for the compiler to work out everything it needs to know. So what I'm proposing is an additional sugar for the very same behavior you demonstrated in your example and I demonstrated in mine.

andymcn commented 9 years ago

Sorry @jemc, yes I did misunderstand what you were asking for, I thought you just didn't know about object literals.

Yes a wrapping recover round the object literal would currently be needed. However we'll soon be adding the ability to create them as isos, so I'll assume that here.

When we put in lambdas we discussed adding captures and decided there was no point, since you could just use an object literal. Maybe that decision was wrong.

It's worth noting that minimising the amount of typing required is very much a secondary concern in the design of Pony. We're far more concerned with having syntax that is very simple to parse and clear to the programmer.

I think that your proposed syntax does achieve those aims. I'm not sure ~() is the best choice, but it will do for this discussion.

One technical point, both lambdas and partial calls are implemented as sugar for object literals (which are implemented as sugar for anonymous classes). This means that implementing a lambda capture as a partial call on a lambda would result in 2 classes being defined which were always used inside each other. Writing sugar to implement a lambda with captures directly as an object literal would be preferable.

There is one thing I don't like about your proposal that I think should be changed, captures should be arbitrary expressions, not just variables. My reasons are as follows:

Capturing an expression is clearly more general. Having to define an extra variable just to put a value into so you can then capture it seems a little annoying.
Captured iso or trn variables would need an implicit consume which would make that variable undefined from then on. We try to avoid implicit things like that in Pony because it makes it harder for the programmer to see what's going on. Also, due to generics, the compiler couldn't tell whether a consume was required or not in the general case. Therefore we'd have to consume all variables, which would give incorrect behaviour for non-iso and trns.
Capturing a variable creates a new variable of the same name in an inner scope, ie this is variable shadowing. We don't have shadowing anywhere in Pony because it is a big source of bugs. Allowing expressions would require specifying new names for the captures, which would eliminate this shadowing.

I know that some other languages base capturing on variables, but they don't have consume to worry about and allowing shadowing.

So, taking a look at one section from your example. You've proposed (recovers replaced with isos):

ra.next(lambda iso~(h,a)(m: zmq.Message) =>
    h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
    a.dispose()
  end)

This is essentially equivalent to the object literal version I suggested, which is clearly far more verbose:

ra.next(object iso
  var _h: TestHelper = h
  var _a: zmq.Socket = a
  fun apply(m: zmq.Message) =>
    _h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
    _a.dispose()
  end)

If we allow expression captures we get something like:

ra.next(lambda iso~(_h = h, _a = a)(m: zmq.Message) =>
    _h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
    _a.dispose()
  end)

We could allow, or possibly even require, the type of the capture to be specified. This might be useful if you want the capture to be a more general type than the evaluated expression. On the other hand it might make the simple case too verbose.

ra.next(lambda iso~(_h: TestHelper = h, _a: zmq.Socket = a)(m: zmq.Message) =>
    _h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
    _a.dispose()
  end)

I think that allowing the capture type to be specified, but not requiring it, is probably the best approach.

jemc commented 9 years ago

@andymcn - thanks for the detailed response!

Yesterday I spent the better part of the day playing with this idea in the parser and compiler to get a little more familiar with Pony's internals so I could be more informed for this discussion. I actually did encounter some headaches related some of your points here (like shadowing), so I'm more prepared now to accept some concessions in succintness than I would have been before. Typically I'm a developer who prefers succinctness and implicitness where possible, but I'm coming to Pony because I'm interested in benefiting from the statically-analyzed reference capabilites, so I have to be prepared to be more explicit than I'd otherwise like :smile:. That said, I think the points you made here are spot-on.

Regarding the specifics of the syntax, I'm not particularly attached to the ~(...) - it's just something that came to mind.

Anyway, since this is a feature that I feel strongly about, I'm prepared to do whatever work I can to help bring it about. If you and the other language designers come up with a plan along these lines for how it should look and work, I'm happy to write the implementation if desired. I'd also not be offended if you decide you'd prefer to do it yourselves because you don't want someone unfamiliar with the project to be mucking about with internals that could endanger safety in subtle ways.

andymcn commented 9 years ago

Thanks for the offer of help. However, this shouldn't actually be too much work, but there are a few subtle things to watch out for, so I think I'll just do this one myself. I should get it in later this week.

Getting consensus on the exact syntax may be a little tricky right now as various people are on holiday etc. I'll use your suggested syntax for now and then we may change it in a few weeks if we come up with something nicer.

As a programmer I also prefer succinctness, a very simple piece of code, without extra warts, just looks better. However, making the simple case too succinct often causes more complex cases to be ugly, misleading or just plain broken. Also, over the years I have come to the conclusion that implicitness is bad. Any time the programmer looks at the code and sees something different from what the compiler sees there's a problem. When the code doesn't do what the author thought it did you have bugs.

sylvanc commented 9 years ago

@jemc, this is great stuff. @andymcn and I just went over this, and he'll be finishing it up soon. Some quick notes:

I think all 3 types of capture are needed, ie:
1. Just an id, since it's the nice simple case @jemc shows.
2. id = expr, since it allows inlining to avoid an unnecessary variable in the enclosing scope.
3. id: Type = expr, since it allows upcasting the result of the expression.
I really like lambda iso, it fits with recover iso, etc. @andymcn will be adding this for object as well.
For syntax, how about putting the lambda captures after the parameters? That way they are optional and require no introducing token:

lambda iso(m: zmq.Message)(h, a) =>
  h.expect_eq[zmq.Message](m, recover zmq.Message.push("foo") end)
  a.dispose()
end

jemc commented 9 years ago

@sylvanc Sounds great!

andymcn commented 9 years ago

This is now all done. Everything is as per @sylvanc's notes above.

Note that if a capture just specifies a variable there is no implicit consume, so if that variable is an iso or trn you'll only get an alias, ie tag or box. If you want to consume a variable while capturing then give it a name, capture an expression and use an explicit consume, eg _x = consume x.

jemc commented 9 years ago

This is now all done.

Awesome! I can't wait to try it out later tonight. Thanks @andymcn and @sylvanc for your consideration on this matter, and for all your work to flesh out the design and implementation!

Note that if a capture just specifies a variable there is no implicit consume.

Makes perfect sense, and it's what I would expect, but thanks for making that explicit.

jemc commented 9 years ago

So far, this is working great, aside from one snag. In my original example code, there is one part that doesn't transition fully smoothly. That is, I get a compiler error for something I think should be possible to do safely. Here is the original code (where ra, rb, and h, are all local tag references to actors):

    ra.when_closed(lambda iso(h: TestHelper, rb: _SocketReactor)(h,rb) =>
      rb.when_closed(lambda iso(h: TestHelper)(h) =>
        h.complete(true)
      end~apply(h) end)
    end~apply(h,rb) end)

With the new syntax, here is how I expect to be able to rewrite the code:

    ra.when_closed(lambda iso()(h,rb) =>
      rb.when_closed(lambda iso()(h) =>
        h.complete(true)
      end)
    end)

But I get the following compiler error:

/home/jemc/1/code/hg/pony-zmq/zmq/test/socket_transport_tests.pony:67:35: cannot capture "h", can only capture fields, parameters and local variables
      rb.when_closed(lambda iso()(h) =>

Obviously there is an issue here with directly nesting captures. It seems like this should work, as h here is a let field in the final formulation of the outer lambda's object literal representation, and it is a tag, so it should be safe to alias as such. My guess is that the outer lambda's object literal representation is not full resolved or "in place" at the time when the compiler tries to find the definition of h for the inner capture.

Note that the following example works, where I use the h' = h form of capture (but is unnecessarily cumbersome unless there is a technical reason why the above example cannot be made to work as shown):

    ra.when_closed(lambda iso()(h,rb) =>
      rb.when_closed(lambda iso()(h' = h) =>
        h'.complete(true)
      end)
    end)

andymcn commented 9 years ago

Thanks for the feedback @jemc.

The nested lambdas look like they should work. I think this is actually a bug with checking whether a name is a field (since from the point of view of the inner lambda h is a field of the outer lambda). I'll investigate.

andymcn commented 9 years ago

That should be fixed now. @jemc please confirm your case now works.

jemc commented 9 years ago

Yep, it works now. Thanks again!

ponylang / ponyc

Syntax sugar for lambda with immediate partial application of arguments. #273