swarm-game / swarm

Resource gathering + programming game
Other
838 stars 52 forks source link

Show name of currently executing function in the robot panel #2133

Open kostmo opened 2 months ago

kostmo commented 2 months ago

While developing a scenario, it would be useful for debugging to be able to see which function a system robot is currently "in" as it executes its program.

Sometimes to achieve this I have placed a say "function_name"; at the top of each function in a robot's program (i.e. "printf debugging") to accomplish this. But it would be nice to have an automatic mechanism.

I imagine it is feasible:

byorgey commented 2 months ago

Annotating each Term with the closest function definition sounds tedious. It should be easier than that, without requiring any changes to terms: in the CESK machine, every time we look up a variable in order to evaluate it, we can set the "current function" to the name of the variable, and push a special stack frame that remembers what the previous "current function" was; when we get back to one of those stack frames, restore the old value.

kostmo commented 2 months ago

every time we look up a variable in order to evaluate it

Is that done on this line?

we can set the "current function" to the name of the variable

I'd want to distinugish between variables that refer to "toplevel function definitions" vs. variables that refer to local definitions. Is that distinction tracked in the Term type?

byorgey commented 2 months ago

Is that done on this line?

Hmm, yes, that's the line I had in mind, but now that I think about it, this won't work. When we look up a variable in the environment, it is already evaluated; at some later point we may execute it (if it is a command) or reduce it (if it is a lambda that ends up getting applied to an argument) but in any case that happens at some later point unrelated to when the variable lookup happened.

Maybe annotating all terms would actually be the right way to accomplish this...

Regardless of how we accomplish it, I do worry that this feature might be difficult to maintain if we start doing more code optimization, see e.g. #1563 .

I'd want to distinugish between variables that refer to "toplevel function definitions" vs. variables that refer to local definitions. Is that distinction tracked in the Term type?

It is not, but I don't think it would be too hard to enhance Env so it tracks additional information about each variable.

kostmo commented 2 months ago

IIRC, the Syntax' type is a wrapper of the Term type, and is discarded once we are running the CESK machine? If that were not the case, then at dialog display time we could infer the toplevel function name from the SrcLoc of the term.

byorgey commented 2 months ago

Terms have their type annotations erased before putting them in the CESK machine, but I think they still have SrcLocs. Or if they don't, we could easily change it so the SrcLocs are left in, I think.

at dialog display time we could infer the toplevel function name from the SrcLoc of the term.

Not unless you keep the original source code around somewhere. And keep in mind that function definitions in scope could also come from some file that was run/imported. We could extend SrcLocs to contain the file name where it came from (which we should probably do anyway), but you would need to keep all the original file contents in a map or something so you could figure out the toplevel function name corresponding to a given SrcLoc. In any case that seems like a kind of expensive and redundant way to do it --- why bother re-parsing through the original source code to find the nearest enclosing def when we already parsed it into an AST in the first place?

kostmo commented 2 months ago

Thinking about this a bit more, as a user of this functionality there should be clarity around what it means to be "in a function", or what is the "currently executing function".

Potentially one way to render the information would be as a "call stack". However, given TCO and infinite recursion as standard practice, a "stack" representation is less accurate/useful.

I think, as a developer, the most useful thing we could get out of this is a trail of "breadcrumbs" that indicate when a function of cmd type has been invoked, leaving aside any notion of when the function has "exited". This helps obtain insight into operation of a "state machine" where each toplevel function represents a different state.

To implement this, we would could have a Label primitive (of type Text -> Cmd Unit) that is automatically injected into the code as the first child of an effectful function, similar to Noop. Then we would maintain a rolling buffer of the last N encountered Labels for each robot.

byorgey commented 2 months ago

To implement this, we would could have a Label primitive (of type Text -> Cmd Unit) that is automatically injected into the code as the first child of an effectful function, similar to Noop. Then we would maintain a rolling buffer of the last N encountered Labels for each robot.

That sounds like a nice idea in theory. I agree the notion of a "call stack" or "currently evaluating/executing function" might be a bit ill-defined. I note, however, that inserting a label primitive "as the first child" of an effectful function is not necessarily trivial, especially when higher order functions are involved. For example:

def on : Int -> Cmd Unit -> (Int -> Cmd Unit) = \x. \c. \y. if (x == y) {c; c; c} {} end
def f : Int -> Cmd Unit = on 3 move end

f is clearly an "effectful function" (i.e. it has a type of the form ... -> ... -> Cmd a) but it is unclear where we could insert a label primitive in the definition of f. We can't insert it before the call to on since the result of on has a function type, not a command type. We can't really insert it before move, like on 3 (label "f"; move), because then the label would be executed either 3 times or no times at all.

xsebek commented 2 months ago

Sometimes to achieve this I have placed a say "function_name"; at the top of each function in a robot's program (i.e. "printf debugging") to accomplish this. But it would be nice to have an automatic mechanism.

How about adding a __func__ instead?

def trace : Cmd a -> Cmd Unit = \c.
  try {
    log ("STARTED " ++ __func__); c; log ("FINISHED " ++ __func__)
  } { log ("FAILED " ++ __func__) }
end

def on : Int -> Cmd Unit -> (Int -> Cmd Unit) = \x. \c. \y. if (x == y) {c; c; c} {} end
def f : Int -> Cmd Unit = \i. trace (on 3 move i) end

EDIT: We could add label as a command to show text in the robot panel and make it usable for other things as well.

Regardless of how we accomplish it, I do worry that this feature might be difficult to maintain if we start doing more code optimization, see e.g. https://github.com/swarm-game/swarm/issues/1563 .

Bonus points for if (__debug__). 😄

byorgey commented 2 months ago

But implementing __func__ would be just as difficult.

xsebek commented 2 months ago

I meant it to be set to the name of the definition where it is used, but I did not realize that __func__ would always be trace in my code. 😄

If the trace was inlined (like a macro) before replacing __func__, it would work as intended. 🙂

byorgey commented 2 months ago

If the trace was inlined (like a macro) before replacing __func__, it would work as intended. 🙂

Ugh, that sounds horrid, because the meaning of __func__ then cannot be determined locally, it depends on the specifics of how inlining happens. I really want to stick with the nice, elegant semantic model we have and not go mucking about with macros, preprocessors or anything like that. (To be clear, I have no problems with inlining itself as an optimization technique, as long as it does not change the meaning of the program!)

xsebek commented 2 months ago

Even without macros, having __func__ would be easy to copy-paste and not break when renaming the definition. This would automate part of @kostmo's proposal and allow the user to decide what to do with that name.

@byorgey I guess we could replace this identifier after parsing so that it would be comparably easy to implement.