openllb / hlb

A developer-first language to build and test any software efficiently
https://openllb.github.io/hlb/
Apache License 2.0
108 stars 12 forks source link

Exit early on ErrDebugExit after Solve #328

Closed yun-wang closed 2 years ago

yun-wang commented 2 years ago

This PR fixes an issue where the debugger would hang at d.mode = <-d.control if d.mode is already DebugTerminate, and there is no more control value from the channel.

hinshun commented 2 years ago

Hmm.. so if I'm remembering this right, this part of the code attempts to cast the return value as a Request and then evaluate it, so that the debugger won't continue past execution that would otherwise error out in a non-debugging hlb run.

We capture the solve error as a yieldError so that it displays in the TUI and you can remain in the debugger in order to reverse, re-solve, do other debugging tasks.

There seems to be a deadlock when the debugger enters DebugTerminate between lines: https://github.com/openllb/hlb/blob/83da43256207303ec69ced2a8a7eda18c25723d0/codegen/debugger.go#L315-L321

Is that right? It's not clear to me yet this is the right fix. Perhaps there should be some kind of concurrency control around giving clients the opportunity to terminate vs continuing on to evaluating the request.

yun-wang commented 2 years ago

Correct, there is a deadlock when the debugger enters DebugTerminate in the req.Solve method.

I bumped into the issue with our ControlDebugger implementation for shell-on-error. Once the debug shell exits, we call Terminate() to end the debugging session. The ErrDebugExit error is returned from the req.Solve method (from the error handler). The solve error is actually ErrDebugExit. It doesn't make sense to remain in the debugger in this case.

yun-wang commented 2 years ago

Regarding the current fix, is there a scenario where d.err != nil after req.Solve returns, and we'd still want to remain in the debugger?

hinshun commented 2 years ago

Okay, I'm happy to move forward with this PR. Though would be nice to write a test that hangs without this fix. It sounds like it can be reduced to a simple reproduction.