swarm-game / swarm

Resource gathering + programming game
Other
842 stars 52 forks source link

Error when wrapping build #150

Closed jrvieira closed 3 years ago

jrvieira commented 3 years ago

Describe the bug build command fails after passing through identity function

To Reproduce

def id = \a. a end

id build "" {} <- this errors
id ( build "" {} ) <- this works, however

Expected behavior x and (id x) should always be equivalent

jrvieira commented 3 years ago

After some experimenting I noticed that this seems to be the funtamental error:

let b = build in b "" {}

This gives a "Bad application..." error.

byorgey commented 3 years ago

Ah, thanks for reporting this. It turns out it's because of the way we rewrite applications of build by wrapping the second argument in a delay, so the robot being built will be the one to evaluate everything, instead of the parent robot evaluating the program and then the child robot executing it. When build is hiding behind a definition then the rewrite doesn't happen, and the evaluator doesn't see the delay it is expecting.

Wrapping the program argument to build in a delay doesn't actually make a huge difference, because it only affects which robot evaluates the argument, not which one executes it. It might make a difference if e.g. the program is inside a let, and the let-bound thing uses some capabilities. For example, in

build "fred" (let x : int = 1 + 2 in move)

It is fred who needs the capability to do arithmetic, because evaluation of let x : int = 1 + 2 in move will be delayed. However, in

def b = build end; b "fred" (let x : int = 1 + 2 in move)

it is the base that will need arithmetic, because it will evaluate the (non-delayed) let-expression to the value move, which will then be passed to the build command.

I think the solution is to just allow build to work with a program argument that has an optional delay. It doesn't cause any problems, it just might possibly create some unexpected situations in terms of which robot needs what capabilities, as in the example above.

The fact that the arguments show up in reverse order in the error message is a red herring --- it's an unrelated bug.

byorgey commented 3 years ago

@polux mentioned in IRC that this seems similar to "multi-stage programming", a la Meta OCaml. We could imagine build having a type something like string -> code (cmd a) -> cmd string where code t represents the type of a (delayed) code fragment which will evaluate to something of type t once compiled and run. It might be worthwhile (later) to look into this more explicitly.

In a sense, we have a very poor man's version of this already, with exactly two stages: evaluation and execution. Evaluating something of type cmd a cannot have any effects other than possibly requiring some capabilities.

byorgey commented 3 years ago

Hmm, maybe a better solution would be to reflect the delay in the types. The simple way to do this would be to give build a type like string -> (() -> cmd a) -> cmd (). The more principled way would be to expose a delayed a type, with built-in syntax for creating delayed values (maybe some kind of special brackets; not a constant), and a built-in constant force : delayed a -> a. Then build could have the type string -> delayed (cmd a) -> cmd string.

We already have syntax for introducing delay and a force constant; it would just be a matter of exposing these in the surface syntax and adding a delayed type.

polux commented 3 years ago

The simple way to do this would be to give build a type like string -> (() -> cmd a) -> cmd ()

I don't think that would solve the problem you described above.

def b c = build (\_.c) end; b "fred" (let x : int = 1 + 2 in move)

In that case it's still the base who needs arithmetic.

The more principled way would be to expose a delayed a type, with built-in syntax for creating delayed values (maybe some kind of special brackets; not a constant)

Right! Having something like this instead of the full power of quasi-quotation may be enough. If I understand correctly, your proposed special brackets are akin to the "lazy" keyword of https://ocaml.org/api/Lazy.html, which must be weaker than Meta OCaml in ways I'm not sure I know how to completely characterize. I guess the key difference is splicing, which we may not need here indeed and adds a great deal of complexity.

I'm still wondering if there aren't examples where the full power of staged programming wouldn't be needed to finely control who evaluates what, but I can't think of any right now. It would definitely be more user friendly if we could get away with the simpler mechanism.

byorgey commented 3 years ago

I don't think that would solve the problem you described above.

Well, if you explicitly make a non-lazy version of build with (\_.c), then sure. But point taken, it doesn't solve the problem very well.

If I understand correctly, your proposed special brackets are akin to the "lazy" keyword of https://ocaml.org/api/Lazy.html,

Yes, that looks like exactly the same thing as what I propose.

I'm still wondering if there aren't examples where the full power of staged programming wouldn't be needed to finely control who evaluates what, but I can't think of any right now.

Perhaps, but I think you can pretty easily control who evaluates what just by moving let-bindings in and out of delay brackets.

polux commented 3 years ago

After I left that sink in and in the wake of #223 I realized a few things:

Regardless of whether or not we keep delay as a native construct and assuming we change if to take explicit delayed values, I have what I expect to be somewhat controversial syntax proposal: why not use {..} for delay? Today {..} is just an alias for (..) if I understand correctly, but we could change that. It would break some existing programs but I expect not that many: it seems to me {..} is mostly used for arguments to build and if in the swarm-lang programs I've seen. One cool application of this is that you can define native-looking control structures, like in scala. Granted there are so only many control structures you can define in a pure language, and the non-pure (monadic) fragment of the language doesn't really require any delay because monadic values are already suspended. But one cool application for instance would be user-friendly user-defined short-circuiting boolean operators. You could define your own native-looking if with true = fst and false = snd, or even your own switch-case for a given enum.

What do you think?

xsebek commented 3 years ago

@polux I was thinking about using {...} for delay too :+1: If it is doable, I think it would be more intuitive and print nicely.

byorgey commented 3 years ago

First of all, I am definitely convinced re: using {...} for delay instead of {{...}}. I didn't need much convincing: that was actually the first thing I considered, but then for some reason I chickened out and decided to propose something that didn't break existing uses of {...}. But that was pretty silly. There's no good reason to have (...) and {...} be synonymous, and the more lightweight syntax will look nice. (It's not like we can avoid breaking existing programs when changing the type of build anyway.) I also very much like the idea of making the delay explicit in the type of if, so it is no longer magic, and I agree that having to always write {{...}} around both branches would be too heavyweight, but {...} seems natural. While we're at it, we should probably wrap the second argument of try inside a delay as well.

What do you think about punning on the {...} notation and using it to denote a delayed type, as well? So e.g. if we have t : ty then {t} : {ty}?

If I read the CEK machine correctly, unlike OCaml's Lazy, swarm-lang's delay does not do any memoization. I think we would need a CESK machine in order to achieve that.

Correct. So I think you're right that delay ty currently acts a lot like a synonym for () -> ty. However, after thinking about this a bunch in the shower this morning, it is leading me in a very different direction: why not upgrade to a CESK machine and have delay provide memoization? The downside is that it makes the runtime a bit more complex, and it would obviously be a much bigger change (it would require threading an extra memory (the "Store") through everywhere, adding some additional evaluation rules, and implementing a garbage collector of some sort). However, I have implemented systems like it before, so I am confident I could make the updates relatively quickly. The benefits would be:

polux commented 3 years ago

Great! Looks like we all agree on using {...} as a notation for delay! We still need to settle the syntax for forcing. In #223 you propose ! but I am wondering if it is worth "wasting" a very common and overloaded symbol for this on something that should be pretty rare. Why not simply a force function?

What do you think about punning on the {...} notation and using it to denote a delayed type, as well? So e.g. if we have t : ty then {t} : {ty}?

I think I like it: if : bool -> {a} -> {a} -> a nicely reads as "takes a bool and two blocks of type a and returns an a". But it seems to be a departure from using ML-like (vs Haskell-like) syntax for pairs : * (vs (,)). I still think that is fine as long as we don't later use [a] for lists, in which case I fear {a} could be understood as "a set of as". These are all minor concerns I have though, I don't feel very strongly about it.

why not upgrade to a CESK machine

If you're up for it then yes by all means! I agree that this is a prerequisite for any serious use of lazy data structures. The only thing that makes me a bit nervous is the cost of garbage collection.

Please note that I haven't given much thought to whether a CESK machine is really needed for implementing that memoization. That is a hunch I have and you seem to share it but maybe there is a clever way to do that with a CEK machine?

byorgey commented 3 years ago

I am wondering if it is worth "wasting" a very common and overloaded symbol for this on something that should be pretty rare.

Good point. In fact I think at one point there was a discussion in IRC of using prefix ! syntax to give a kind of Applicative notation. I guess it will be pretty rare for most players; the times you would need to use it are (1) if you start defining your own custom control structures or build variants, or (2) if you start defining some lazy data structures. I am fine with just using force.

I still think that is fine as long as we don't later use [a] for lists, in which case I fear {a} could be understood as "a set of as".

Agreed, I do not intend to use [a] for lists, partly just because I don't want the list type itself to be built-in, but definable using sum and product types and recursive types. More importantly, however, I also just plain don't like it. {t} : {ty} is a good pun because the syntax for values mirrors the syntax for types. Bracket notation is a bad pun because [x] means something very different at the value and type levels (singleton list vs. type of all lists). I really think Haskell should have just defined a data type called List.

The only thing that makes me a bit nervous is the cost of garbage collection.

Yes, it's worth thinking about this carefully, and we could quite possibly just leave garbage collection out entirely in a first pass. Probably in most cases no garbage collection will be needed since robots will allocate a small, constant number of memory cells (i.e. one memory cell per recursive definition).

I haven't given much thought to whether a CESK machine is really needed for implementing that memoization. That is a hunch I have and you seem to share it but maybe there is a clever way to do that with a CEK machine?

I really don't think so. CEK machine environments are lexically scoped, but the store for memoized lazy values has to be dynamically scoped, because values can be returned out of the scope in which they started life. (e.g. f : int -> int = \x. force {x+1} returns a reference to a newly created reference cell). CESK machines are an entirely standard way to implement this. See e.g. PFPL chapter 36 (Harper doesn't define the semantics in that chapter in terms of a CESK machine explicitly, but the semantics threads through both a lexically scoped environment and a dynamically scoped store, and it's not hard to see how to put it together with e.g. chapter 28 to end up with exactly a CESK machine).

polux commented 3 years ago

SGTM on everything you wrote!

byorgey commented 3 years ago

So I think we have another choice to make. In a previous comment I said

Probably in most cases no garbage collection will be needed since robots will allocate a small, constant number of memory cells (i.e. one memory cell per recursive definition).

However, as I have been working on implementing this, I realize that if we implement this in the most straightforward way, that will not be true at all. The straightforward idea would be that evaluating a delayed expression would always allocate a memory cell to hold that expression, and later to hold whatever value it evaluates to once it is forced. However, if we now have things like if, build, and try taking delayed arguments, that is going to be a lot of allocated memory cells, most of which will be pointless. For example, imagine if b {3} {5} allocating two memory cells, one to hold the expression 3 and one to hold 5. There is no reason this should allocate any memory cells at all. The places where allocating memory cells in the store will actually be helpful are (1) recursive definitions, and (2) lazy data structures (and maybe some other scenarios we haven't thought of). In cases where we plan to use a value exactly once and just want to control when it is evaluated (e.g. most uses of if, build, and try), allocating a memory cell is pointless and even harmful (because it's extra bookkeeping and because we'll probably have to garbage collect it later).

So I'd like to propose something slightly more nuanced. We can have two different kinds of delayed expressions, memoized and non-memoized. Both will have the same type {ty}, and both can be forced with force, the only difference will be the runtime behavior (which can affect the efficiency of a program but not its semantics).

polux commented 3 years ago

As discussed on IRC this week-end, as much as I find this disappointing that we need to expose this difference to the user, I agree this is the most reasonable thing to do, for now at least.

Once/if we implement garbage collection, we could investigate the hit taken by storing every delayed value in the store.

byorgey commented 3 years ago

I briefly thought that having a syntax for explicitly requesting memoization was unnecessary, since any place it would be needed would already use recursion or a definition anyway. But this is actually not true. As a counterexample, consider this future pseudo-swarm code:

type stream a = rec s. a * {s}

def mapstream : (a -> b) -> stream a -> stream b = \f. \s.
  (f (fst s), {{ mapstream f (force (snd s)) }} )

Although the expression mapstream f (force (snd s)) is part of a recursive definition, it is not itself recursive, and we would need to explicitly request a memoized delay (which we almost certainly want in this situation).

In IRC, @fryguybob mentioned the possibility of using weak references. The idea, as I understand it, would be to implement Store using weak references to key-value pairs, in such a way that a key-value pair would be removed from the mapping once there are no longer any other references to the key. In other words, this is a way to co-opt GHC's garbage collector into doing our garbage collection for us! This is definitely worth looking into, though probably as a separate PR. I think the ideal looks something like

But again, I'm not sure we can/should implement all this right away.

polux commented 3 years ago

I like the idea of using weak refs very much but I have a hard time imagining how this would work exactly. Would we still need the S in CESK? Or would it be like having IORefs in our machine state? What makes it a bit scary is that the machine state is not some inhabitant of an algebraic datatype that we can easily inspect/traverse/serialize etc. But I agree that co-opting the GC is appealing and may make it worth.

byorgey commented 3 years ago

We definitely still need to be able to serialize the machine state for #50 ! Here's how I imagine it would work. We still need the S in CESK, which would be a map from keys to Cells. The keys would not necessarily be Int, and a VRef would contain a key. Key/Cell pairs would be stored via a Weak reference, with a finalizer that will delete the key/Cell pair from the map when there are no longer any references to the key (meaning there are no more VRefs anywhere which contain it).

There are a bunch of details to be worked out here which I don't quite understand, but intuitively I believe it should be possible.

byorgey commented 3 years ago

I suppose having IORefs in our machine state could work too, and would be simpler in some sense, but it would indeed make the state much harder / impossible to serialize, because it would be impossible to observe any sharing between IORefs.

byorgey commented 3 years ago

What about just making IORefs that contain a unique integer + a cell? And we just assume that any IORefs with the same integer are the same. Can we get away without needing weak references or any Store mapping at all?

fryguybob commented 3 years ago

The original intent of weak references in Haskell was to support fancy memoization. It may be helpful to read that paper:

https://www.microsoft.com/en-us/research/wp-content/uploads/1999/09/stretching.pdf

byorgey commented 3 years ago

After some discussion on IRC, we think just having IORefs containing a Data.Unique + a Cell would work (the Data.Unique is so we can "observe" sharing of IORefs; serializing and deserializing would both involve building a temporary map from Data.Unique values to Cell values). The downside is that the game state would no longer be persistent, which could limit our options in the future, e.g. if we wanted to provide a way to travel back in time... hmm, and now that I think of it, how does having a CESK machine interact with continuations (#133 )?

One nice idea would be to abstract over both the Store representation (as a type parameter to CESK) and operations (as a custom effect, once we have #228 ). Then we could easily play with different implementations and swap one out for another.