swarm-game / swarm

Resource gathering + programming game
Other
835 stars 53 forks source link

Variables in a local monadic binder escape to outer scopes #681

Closed byorgey closed 3 months ago

byorgey commented 2 years ago

Figured out what's causing the bug. Basically the b <- ... in the definition of until is overriding the definition of b in hanoi, which absolutely should not happen.

Originally posted by @byorgey in https://github.com/swarm-game/swarm/issues/669#issuecomment-1243051295

To reproduce, put the following in a .sw file and --run it in Creative mode:

def f = a <- scan down end;
let a = 2 in f; return (a+1)

The binding of the variable a from the definition of f is escaping into the outer scope from which f is called, so that a is now bound to inl () instead of 2, which obviously causes a runtime crash when the + operator is applied to a non-number.

Edited to add: we should keep the below discussion around for reference, but see #1087 for my latest thinking on the issue.

byorgey commented 2 years ago

I was trying to do a git bisect but it seems like this bug has been around for a long, long time. For example it seems to be present already in https://github.com/swarm-game/swarm/commit/008d4000a542734055be13456b6252d915af6817 , which is the commit that introduced --run (it gets more annoying to test before that).

xsebek commented 2 years ago

Yikes, good job figuring this one out @byorgey. πŸ‘

Maybe we could make the example a unit test, to make it easier to debug/fix? πŸ™‚

I don’t know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?

For example the inner variable could be named f::a[0] and the outer one would be _::a[0]. πŸ€”

byorgey commented 2 years ago

Maybe we could make the example a unit test, to make it easier to debug/fix? slightly_smiling_face

Yes, good idea.

I don’t know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?

I mean, that's a reasonable idea, but I think it would just be a band-aid, and would probably lead to other bugs. In theory, we should be able to fix it simply by managing environments in such a way that the dynamic semantics (i.e. which values flow where) matches the static semantics (the scopes etc. enforced by the type checker).

xsebek commented 2 years ago

The worst thing about this is that it has horrible error messages: image

def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )
byorgey commented 2 years ago

It's actually a pretty good error, in the sense that it really is a fatal error and a bug in swarm, and the machine state at least gives us a bit of info about what's going wrong.

byorgey commented 2 years ago

Current progress: so when a def is executed, any environment generated by evaluating the definition body definitely is not added to the current environment. However, I think the problem is that evaluating the definition body actually doesn't really do anything: it simply allocates a delayed expression in the store. The delayed value in the store is actually evaluated/executed the first time the definition is referenced. I guess the problem is that this ends up acting too much as if we had just spliced the definition body into the location of the first reference to the defined name, which doesn't respect scoping of bound names.

Evaluating a delayed expression should never cause names to get bound, however, so maybe we can just change the way that happens?

Tried wrapping evaluation of delayed things in a new FDiscardEnv frame which causes any generated environment to be discarded, but that wasn't in the right place: evaluating the body of a definition (if the body is a bind expression) won't actually generate the environment, executing it will.

I guess the real issue is that whenever we're executing something and we look up the value of a variable, we should discard any environment generated by whatever the variable resolves to, because it came from some nested or non-local scope. Even that is tricky though... we have to know when such a value looked up as the value of a variable meets an FExec frame, which might be at some later time. Maybe the solution is to make a new kind of value, VNested :: Value -> Value; when this value meets an FExec frame, it generates a FDiscardEnv frame so we discard whatever environment is generated from executing the wrapped value. I wouldn't want to wrap every single variable lookup inside VNested; we only need to do it for variables with a cmd type. So maybe we can wrap def and let bodies for things of cmd type at typechecking/elaboration time. However, one small problem is that currently the elaborator doesn't have access to the types of terms. Hmm, and also maybe we need to also wrap commands that end up getting passed as arguments to a function. Simply wrapping all variables of type cmd seems much more robust --- it will still work even if we end up later with additional binding forms for variables (e.g. pattern matching).

byorgey commented 1 year ago

Wrapping all variables, period, is problematic, because then we have a VNested wrapper in a bunch of places where we need to ignore it but there's no nice, uniform way to do that. I am increasingly coming to the conclusion that we need to carry more type information through somehow so that we can make sure to only wrap things of type cmd a --- not just as an optimization, but as an issue of correctness. At a minimum, when we look up a variable in an environment at runtime, we need to know whether it has a cmd type or not.

xsebek commented 1 year ago

Just wanted to share that the error messages have improved:

def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )

image

def f = a <- scan down end;
let a = 2 in f; return (a+1)

image

Thanks, @byorgey! :+1:

byorgey commented 1 year ago

Optimistic we can actually get this fixed soon after we clean up and merge #991 , which I hope to get to soon. This past week was the first week of classes so not much extra time for me, but things should soon settle into more normal patterns...

byorgey commented 1 year ago

This bug is so interesting and so irritating. I am learning a lot. My latest realization (thanks to #1033) is that it's not enough to just wrap variables with a cmd type, because that does not take into account functions which return a cmd type. My whole "wrapping variables" strategy seems less viable now.

I am beginning to think the real problem is that we treat names defined via a bind the same as those created with def, i.e. as globally exported names (see https://github.com/swarm-game/swarm/pull/303/commits/609e9b01f715ee34b915b2113215faac36ebc1a6 which was part of #303 ). Really, we only want that for top-level binds written at the REPL. We don't have this issue with def because it can only occur at the top level in the first place. I wonder if there is some way we can just make top-level binds special case, perhaps by automatically rewriting them to include a def to export their name. That is, at the top level, we could rewrite x <- c1; c2 to x <- c1; def x = x end; c2. Or, since def introduces extra machinery in terms of delay, maybe we could just add a special export construct and rewrite x <- c1; c2 to something like x <- c1; export x; c2.

xsebek commented 1 year ago

@byorgey the export strategy sounds promising. πŸ™‚

I am interested what your super secret final solution will be. πŸ˜„

byorgey commented 1 year ago

The export strategy is a promising way to solve the wrong problem. :smile: