Closed byorgey closed 3 months ago
I was trying to do a git bisect
but it seems like this bug has been around for a long, long time. For example it seems to be present already in https://github.com/swarm-game/swarm/commit/008d4000a542734055be13456b6252d915af6817 , which is the commit that introduced --run
(it gets more annoying to test before that).
Yikes, good job figuring this one out @byorgey. π
Maybe we could make the example a unit test, to make it easier to debug/fix? π
I donβt know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?
For example the inner variable could be named f::a[0]
and the outer one would be _::a[0]
. π€
Maybe we could make the example a unit test, to make it easier to debug/fix? slightly_smiling_face
Yes, good idea.
I donβt know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?
I mean, that's a reasonable idea, but I think it would just be a band-aid, and would probably lead to other bugs. In theory, we should be able to fix it simply by managing environments in such a way that the dynamic semantics (i.e. which values flow where) matches the static semantics (the scopes etc. enforced by the type checker).
The worst thing about this is that it has horrible error messages:
def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )
It's actually a pretty good error, in the sense that it really is a fatal error and a bug in swarm, and the machine state at least gives us a bit of info about what's going wrong.
Current progress: so when a def
is executed, any environment generated by evaluating the definition body definitely is not added to the current environment. However, I think the problem is that evaluating the definition body actually doesn't really do anything: it simply allocates a delayed expression in the store. The delayed value in the store is actually evaluated/executed the first time the definition is referenced. I guess the problem is that this ends up acting too much as if we had just spliced the definition body into the location of the first reference to the defined name, which doesn't respect scoping of bound names.
Evaluating a delayed expression should never cause names to get bound, however, so maybe we can just change the way that happens?
Tried wrapping evaluation of delayed things in a new FDiscardEnv
frame which causes any generated environment to be discarded, but that wasn't in the right place: evaluating the body of a definition (if the body is a bind expression) won't actually generate the environment, executing it will.
I guess the real issue is that whenever we're executing something and we look up the value of a variable, we should discard any environment generated by whatever the variable resolves to, because it came from some nested or non-local scope. Even that is tricky though... we have to know when such a value looked up as the value of a variable meets an FExec
frame, which might be at some later time. Maybe the solution is to make a new kind of value, VNested :: Value -> Value
; when this value meets an FExec
frame, it generates a FDiscardEnv
frame so we discard whatever environment is generated from executing the wrapped value. I wouldn't want to wrap every single variable lookup inside VNested
; we only need to do it for variables with a cmd
type. So maybe we can wrap def
and let
bodies for things of cmd
type at typechecking/elaboration time. However, one small problem is that currently the elaborator doesn't have access to the types of terms. Hmm, and also maybe we need to also wrap commands that end up getting passed as arguments to a function. Simply wrapping all variables of type cmd
seems much more robust --- it will still work even if we end up later with additional binding forms for variables (e.g. pattern matching).
Wrapping all variables, period, is problematic, because then we have a VNested
wrapper in a bunch of places where we need to ignore it but there's no nice, uniform way to do that. I am increasingly coming to the conclusion that we need to carry more type information through somehow so that we can make sure to only wrap things of type cmd a
--- not just as an optimization, but as an issue of correctness. At a minimum, when we look up a variable in an environment at runtime, we need to know whether it has a cmd
type or not.
Just wanted to share that the error messages have improved:
def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )
def f = a <- scan down end;
let a = 2 in f; return (a+1)
Thanks, @byorgey! :+1:
Optimistic we can actually get this fixed soon after we clean up and merge #991 , which I hope to get to soon. This past week was the first week of classes so not much extra time for me, but things should soon settle into more normal patterns...
This bug is so interesting and so irritating. I am learning a lot. My latest realization (thanks to #1033) is that it's not enough to just wrap variables with a cmd
type, because that does not take into account functions which return a cmd
type. My whole "wrapping variables" strategy seems less viable now.
I am beginning to think the real problem is that we treat names defined via a bind
the same as those created with def
, i.e. as globally exported names (see https://github.com/swarm-game/swarm/pull/303/commits/609e9b01f715ee34b915b2113215faac36ebc1a6 which was part of #303 ). Really, we only want that for top-level binds written at the REPL. We don't have this issue with def
because it can only occur at the top level in the first place. I wonder if there is some way we can just make top-level binds special case, perhaps by automatically rewriting them to include a def
to export their name. That is, at the top level, we could rewrite x <- c1; c2
to x <- c1; def x = x end; c2
. Or, since def
introduces extra machinery in terms of delay
, maybe we could just add a special export
construct and rewrite x <- c1; c2
to something like x <- c1; export x; c2
.
@byorgey the export
strategy sounds promising. π
I am interested what your super secret final solution will be. π
The export strategy is a promising way to solve the wrong problem. :smile:
Figured out what's causing the bug. Basically the
b <- ...
in the definition ofuntil
is overriding the definition ofb
inhanoi
, which absolutely should not happen.Originally posted by @byorgey in https://github.com/swarm-game/swarm/issues/669#issuecomment-1243051295
To reproduce, put the following in a
.sw
file and--run
it in Creative mode:The binding of the variable
a
from the definition off
is escaping into the outer scope from whichf
is called, so thata
is now bound toinl ()
instead of2
, which obviously causes a runtime crash when the+
operator is applied to a non-number.Edited to add: we should keep the below discussion around for reference, but see #1087 for my latest thinking on the issue.