Ability to easily test if a particular process is being debugged

Is your feature request related to a problem? Please describe. Sometimes it is useful to know whether a process is being debugged. At the most basic the process monitor could display this differently from another suspended process, but also, if a timeout (BlockClosure>>valueWithin:onTimeout:) expires for a process that is being debugged, this can unwind the stack out from under the debugger and break the ability to debug. There may not be any one-size-fits-all solution to this (it really depends on why the timeout exists, what the appropriate response might be), but being able to recognize debug state seems like a prerequisite to any solution.

Describe the solution you'd like Add a sixth possible process state (per #14904/#16072)—in Dolphin it is #debug, but the testing method is #isDebuggee, because #isDebug doesn't really work. The state could just be #debuggee for consistency, or whatever, not that important.

Describe alternatives you've considered TBH I got as far as going, "this exists elsewhere and seems useful, let's see how hard it would be" and haven't thought about other approaches. It's kinda hard to get my mind off what I already know...

Expected development cost The main issue here is determining how to "tag" a process as being under debug in a way that is performant (the test is cheap) and reliable (hard to end up with processes that think they're being debugged, but no debugger is actually in control of them). I just had the thought that maybe the effectiveProcess mechanism could help, but that seems fragile and I don't understand it well enough to be confident. At minimum I think there would be conflicts with doTerminationFromAnotherProcess. An explicit tag assigned and removed by the debugger might be a better choice.

Hi @daniels220, thanks for opening this thread. I was asking myself the same question: How do I know a process is in simulation? (i.e. being debugged; sometimes I use "simulation" instead of "debugging"). Even worse, I'd like to know if a context belongs to a currently debugged/simulated process (my use case is when unwinding a context I'd like to know if it's in simulation or in "normal" execution and deal with the unwinding accordingly).

I understand Dolphin uses an instance variable 'debugger'. I experimented with using instance variable 'env' (in Squeak it's a dictionary with custom values, haven't tried in Pharo). However, this approach won't help with determining whether a particular context belongs to a simulated process. For this I tried two approaches:

(1) mark the process's bottom context with a tag (execution stack is generally a tree with a common root/bottom) Problems:

adding another bottom context during execution (which would of course be untagged)
too expensive when the sender chain is long

(2) insert special bottom context as a tag

using e.g. harmless #on:do: context would speed up the sender chain search because there's most likely just a few #on:do:s and we can use a primitive #findNextHandlerOrSignalingContext

There's however another problem with both approaches: in general, a process can have more than one execution stack - which is the case when you terminate a process: the unwinding of the original sender chain is controlled from a parallel stack (see #unwindAndStop: - this is the stack that controls termination unwinding)

Using effectiveProcess might be interesting in the case of contexts in simulation but generally it doesn't identify a debugged process because only parts of the simulation are executed "onBehalfOf". Thanks for this idea.

At minimum I think there would be conflicts with #doTerminationFromAnotherProcess

Why's that? I'm interested what you had in mind, Thanks!

Even worse, I'd like to know if a context belongs to a currently debugged/simulated process (my use case is when unwinding a context I'd like to know if it's in simulation or in "normal" execution and deal with the unwinding accordingly).

Ooooh, good point—although in practice Contexts mostly belong to a particular process, they don't directly hold a reference to the process, and can sometimes be involved with more than one.

Now, for your mentioned use-case, aren't you asking from within the potentially-simulated process itself? In that case, isn't it sufficient to know if the active process is under debug? There are a bunch of ways to implement this—the main question I think is how to make it very clear that they only really work for the active process. Perhaps just a ProcessorScheduler>>isActiveProcessSimulated, avoiding the notion of asking an arbitrary process altogether? All you actually have to ask is the equivalent of (Processor realActiveProcess instVarNamed: 'effectiveProcess') notNil—that is, is the real active process simulating _anyone_, because if so, by definition that process will also be the answer toProcessor activeProcess. But obviously with the necessary methods to avoid usinginstVarNamed:`.

But this doesn't help my use-case, which is for the helper process of valueWithin:onTimeout: to take a different action if the process that has "timed out" is being debugged. At this point, (a) probably there won't be any simulation going on at all—probably the debugger is idle—and (b) certainly the debugged process won't be active, the whole point is that a separate process that is not being debugged is interfering. But in that case I do have the process, not just the context, so a simple tag on the process would be sufficient. Perhaps the combination of the two approaches could cover all our use-cases?

Regarding implementation with tagging the whole process, it looks like env in Pharo is a spiritual descendant of what it is in Squeak—specifically, it is the backing storage for ProcessSpecificVariable, which provides a more object-oriented interface with an optimized implementation using preallocated array indices rather than a hashtable. It seems honestly almost perfect to ProcessLocalVariable subclass: #ActiveDebugger ...etc...—it just fits in so well with the existing examples IMO, and the behavior should be correct (just need to make sure it does not inherit, but that's easy, the hooks are there). The one thing is that since it would need to be accessed from outside the process, we'd need to add an API like:

    ^ aProcess psValueAt: index

But this too seems like a reasonable extension of the API—other types of variables need to be told where to read from, e.g. LocalVariable>>readInContext: or Slot>>read:, this would be the equivalent here. (Names all subject to revision, this is off the top of my head.) At this point an accessor on Process falls out nicely as:

    ^ ActiveDebugger valueIn: self

(I could go either way on #debugger or #activeDebugger—the ProcessLocalVariable can't be called Debugger (or perhaps it technically could but obviously shouldn't), and the selector could then derive from it, but, well, Dolphin calling it simply #debugger makes sense, just because it's not actually an ivar here doesn't mean the name is wrong. :shrug:)

If some more robust Context-aware approach is needed, this is the point at which I get out of my depth—every time I look at this stuff, I realize the extent to which I've never really grokked what makes a context machine, a context machine, and how it can be used. (Dolphin, it's worth noting, is fundamentally a stack machine—it provides a reflective API to access the stack, but the class involved is called StackFrame and is just a facade—the actual values are stored in indexed slots of Process, contiguous in memory the way you might expect of a traditional stack.) If I want to understand it I bet I just need to bang my head against it for a few hours, but to do that I need a clear goal/reason to do so that justifies spending the time...

One thing I can say for sure is that I had a misconception about #doTerminationFromAnotherProcess—I thought it simulated the process-to-terminate, rather than what it (much more sensibly) does, of manipulating that process' stack, resuming it, and waiting on a semaphore that it has ensure:'d will be signaled on termination. So there should be no conflict with doTerminationFromAnotherProcess.

Hi, I'd like to know a bit more about what kind of use cases you have in mind.

Personally, I'm not yet convinced that this is a great idea in the long run because the dynamic extent of an "in-debugger" property is very special, mostly driven by user clicking crapice :) For once,

a process may be running normally (running state),
then step into a halt and open a debugger (switch to debugger state),
then the developer can either step an indeterminate number of times (still in debugger state) before clicking on proceed (back to running state)

So what if

we take a decision due to the debugger state and suddenly the process is proceeded?
I do 3 steps then proceed the process will behave differently than if I do 2 steps then proceed?

To me this is a nest of bugs :)

Just FTR, in Pharo there is also (which I'm not too fond of) the ExecutionEnvironment & co, that kind of captures also dynamic properties such as running-in-test, which make more sense to me because (1) they last longer (from a program point of view) and (2) the dynamic scope is clear: until the return of the test.

Hi, I'd like to know a bit more about what kind of use cases you have in mind.

I have one concrete use case: A while ago I "fixed" a situation when debugging [^2] ensure: [] - if you step through until you stand at ^2 and then step over the ^2: you expect to get to the sender context but instead you get an error/freeze because during debugging the stepping mechanism inserts "guard" contexts that make sure the control returns back to the debugger process but in this case the non-local return jumps over the guard contexts and chaos ensues. The "fix" was to send nil instead of firstUnwindContext supplied by the VM which would force a fresh search for the simulation's extra firstUnwindContext in #resume:through: Unfortunately sending nil is just a simple hack that prevents the error but breaks the basic simulation requirement that simulation should mimic the VM behavior (the VM never sends nil as the firstUnwindContext though).

This long intro was supposed to show that the behavior in #resume:through: may depend on whether the active process is in simulation or not. If in simulation, we take care of the simulation guard contexts inserted by #runUntilErrorOrReturnFrom:, while in runtime no action is required because no simulation guard contexts exist during normal runtime.

So, if I can easily identify the active process is being simulated/debugged, the fix is easy (e.g. move the guard contexts to the correct position in the sender chain). In this particular case I can test whether effectiveProcess isNil but this generally doesn't cover all cases. Besides, using effectiveProcess isNil depends on the implementation of #stepOver.

So what if

I do 3 steps then proceed the process will behave differently than if I do 2 steps then proceed?

What do you mean exactly? The process will kind of behave differently - it's simulated vs run - but the "results" should of course be the same. In my case described above stepping over will cause the process do things that stepping through wouldn't but that's ok if the end result is the same, I think.

To me this is a nest of bugs :)

Too bad; at any rate I wanted to share a use case I'm accidentally working on :)

I like your scenario :)

Hi, I'd like to know a bit more about what kind of use cases you have in mind.

I have one concrete use case: A while ago I "fixed" a situation when debugging [^2] ensure: [] - if you step through until you stand at ^2 and then step over the ^2: you expect to get to the sender context but instead you get an error/freeze because during debugging the stepping mechanism inserts "guard" contexts that make sure the control returns back to the debugger process but in this case the non-local return jumps over the guard contexts and chaos ensues. The "fix" was to send nil instead of firstUnwindContext supplied by the VM which would force a fresh search for the simulation's extra firstUnwindContext in #resume:through: Unfortunately sending nil is just a simple hack that prevents the error but breaks the basic simulation requirement that simulation should mimic the VM behavior (the VM never sends nil as the firstUnwindContext though).

This long intro was supposed to show that the behavior in #resume:through: may depend on whether the active process is in simulation or not. If in simulation, we take care of the simulation guard contexts inserted by #runUntilErrorOrReturnFrom:, while in runtime no action is required because no simulation guard contexts exist during normal runtime.

So, if I can easily identify the active process is being simulated/debugged, the fix is easy (e.g. move the guard contexts to the correct position in the sender chain). In this particular case I can test whether effectiveProcess isNil but this generally doesn't cover all cases. Besides, using effectiveProcess isNil depends on the implementation of #stepOver.

Oh this looks like a fun bug :) Actually, having more knowledge about simulation for which your case seems legit.

So what if

I do 3 steps then proceed the process will behave differently than if I do 2 steps then proceed?

What do you mean exactly? The process will kind of behave differently - it's simulated vs run - but the "results" should of course be the same. In my case described above stepping over will cause the process do things that stepping through wouldn't but that's ok if the end result is the same, I think.

What I meant is that we expose this debug process state, I'm afraid that suddenly people will start to write domain code that reads:

Process isDebug ifTrue: [...]

I'm ok with having this debug state as an implementation detail of the simulation itself, but I'm not sure of its value exposed to users... But at the end, maybe this new state will not be observable to users even if there? Maybe it will depend on how it's detection is implemented... I'm now just brainfarting...

What I meant is that we expose this debug process state, I'm afraid that suddenly people will start to write domain code that reads:
Process isDebug ifTrue: [...]
I'm ok with having this debug state as an implementation detail of the simulation itself, but I'm not sure of its value exposed to users... But at the end, maybe this new state will not be observable to users even if there?

Hmm, I get your point... but I'm afraid there are loads of dangerous tools in Smalltalk already. Perhaps a comment such as in #recreateSpecialObjectsArray: "don't even think of playing here unless you know what you are doing" could convey the message ;)

Add a sixth possible process state (per https://github.com/pharo-project/pharo/issues/14904/https://github.com/pharo-project/pharo/pull/16072)—in Dolphin it is #debug, but the testing method is #isDebuggee, because #isDebug doesn't really work. The state could just be #debuggee for consistency, or whatever, not that important.

just an idea: would anyone consider #simulated and #isSimulated?

just an idea: would anyone consider #simulated and #isSimulated?

I like it

I do not really have time right now but this topic interests me a lot. So just some information from the debugger point of view, just in case it could help:

Every opened debugger (even a non-graphical one) uses a debug session object that holds the debugged process. From what I read in this thread, a simple way of identifying if a given process is under simulation control would be to find all debug sessions that are not terminated (i.e. proceeded) and check the target process against each debug sessions's process. Would that achieve what is researched there?
We had the idea to store a weak reference to all debugged processes in the debugger infrastructure. I cannot remember why, we discussed that once or twice with Marcus, perhaps it has to do with reflective problems, but we had to know what process was under debug. Perhaps that could help aswell.

Finally, and still if I understood correctly, this is a debugging concern and therefore I am not found of writing debugging-related APIs in the kernel objects (like Process). I would improve the debugging infrastructure instead (or perhaps you meant this API to be in the form of extension methods).

Regarding concerns about encouraging users to write code that depends on debug state, I wholeheartedly agree with @isCzech:

I'm afraid there are loads of dangerous tools in Smalltalk already. Perhaps a comment such as in #recreateSpecialObjectsArray: "don't even think of playing here unless you know what you are doing" could convey the message ;)

The whole effectiveProcess mechanism is already something we absolutely don't want most users fiddling around with, too...

Hi, I'd like to know a bit more about what kind of use cases you have in mind.

I state the one that always gets me in the issue description—did that make sense? Lemme rephrase/elaborate a bit: I have a process that attempts some operation, and times out (valueWithin:onTimeout:) if it takes too long. If I insert a breakpoint in that operation, or an error occurs, I get a debugger until the timeout expires, then it is "snatched out from under me" by the signalException: from the high-priority process forked by valueWithin:onTimeout:. I would like to be able to implement a version of valueWithin:onTimeout: whose watchdog process, when the timeout expires, checks if the watched process is being debugged, and if so, goes back to sleep for another timeout period. I know this is imprecise (in theory it would be ideal for the timeout to stop counting down for exactly as long as the process is being debugged), but it's more than good enough for what is fundamentally a development-time occurrence—if something weird happens down the line because the timeout took so long to expire, that's on me, I just want it to stop rugpulling my debuggers dammit.

And I need this to be a pretty generic mechanism, because what if, for instance, the process being debugged and the process with the timeout are not the same, but the latter is waiting on output produced by the former—e.g. waiting for a response over a socket or items to post to a SharedQueue? This comes up when debugging a Seaside web application—there's a process waiting (with a timeout) for an HTTP response from a local webserver, and it's the webserver worker process that's stopped in the debugger. But the test harness knows about this relationship, so with a little finagling it should be possible to get it to notice and not abort the whole test with an error just because the web server "isn't responding" (because I'm debugging it ugh!).

Another one that occurs to me: I work with a system using ReStore, which has a global Mutex to prevent some delicate state from getting messed up (become: is involved, it's a fun piece of software). Normally the mutex is acquired and released extremely rapidly, but if an error occurs with it locked, the UI process operating the debugger now cannot access...anything...and will deadlock trying to e.g. produce printStrings for the inspector pane. There's a workaround (which I helped write) for this that allows the development tools to bypass the mutex, but it would be easier to implement, more performant, and more precise if it could ask "is the current owner of the mutex being debugged?"

A few specific responses too:

the dynamic extent of an "in-debugger" property is very special, mostly driven by user clicking crapice :)

That's really the whole point though—to be able to do our best to compensate for that very unpredictability of what the user chooses to do in a debugger. You're right that it's always going to be ugly and messy and imperfect, but it's often less ugly than letting things play out as they otherwise would.

So what if

we take a decision due to the debugger state and suddenly the process is proceeded?

I do 3 steps then proceed the process will behave differently than if I do 2 steps then proceed?

This is definitely a danger, but my starting point was that there are situations where the fact of having a debugger open in the first place already effectively alters the behavior of the code, even without anyone asking about it specifically. The purpose of...I'm becoming partial to the name isUnderDebug...is to be able to attempt to compensate for this, so you're basically saying "what if the person using isUnderDebug does a bad job?", and I mean, yeah, they might, but that's no reason not to allow them to try.

I think a good starting guideline for using it would be, don't make any decisions that have long-lasting effects based on it. These are good:

Do I consider this timeout expired right now or defer the decision some more (and check isUnderDebug again then)?
Can I safely bypass this Mutex right now (but I'll check this every time I would otherwise attempt to acquire it)? But, something I've done that was not so wise:
Do I execute this block within valueWithin:onTimeout: at all? Because now, the behavior of the code in 5s when the timeout would have expired depends on the state of isUnderDebug now. Does that make sense?

Just an idea: would anyone consider #simulated and #isSimulated?

This points out a critical difference between how @isCzech and I are thinking about this: For my use-cases, a process that is suspended in the debugger is isUnderDebug. His use-case seems to only apply to an "active" (or simulation thereof) process. For me, isSimulated is wrong because it implies that something is happening—I might have a case where I don't even use any of the "step..." options, just halt, open an inspector or execute some DoIts, and proceed. So no bytecodes are ever simulated, but I was definitely debugging. I guess isSimulated is basically a subset of isUnderDebug, at least so long as the simulation mechanism is only used by the debugger (which I believe is the case, and...well, I could think of a reason it might not be, but I'd really be stretching for it).

This does point out that it can't really be a state, because actually there are two common debug states: simulated execution during step, and suspended with an open debugger. And simulated execution could even encounter a Semaphore, or be preempted by a higher-priority process, so an otherwise-#ready or #waiting process could also be considered under debug. So it should just be a separate testing method which might answer true in conjunction with any state except #terminated.

@StevenCostiou turns out I haven't had time for a while either, but thank you for chiming in—that's all great information! I know about debug sessions and had the idea of starting from a DebugSession allInstances. The problem with this is performance—some of the use-cases are ones where we need an answer in a short, constant time, and allInstances is always slow and gets slower the larger object memory is. But if we had those debug sessions in a global registry which we could quickly scan, that would almost certainly make enough difference to make it workable. It's not like this is something I expect to run in a tight inner loop, that would definitely be code smell, but there are places where it might be checked in passing in production code, where it will never return true, and does need to be able to answer false fast if there are no debuggers open at all. I don't think this reference would even need to be weak, as there is generally already a strong reference to the session (which holds a strong reference to the process) by way of the World holding the Morph that is the debugger window itself—right?

And yes—especially having clarified that this is not a Process>>state, but a separate testing method, it could probably be an extension method...or perhaps, per my above note about prod code calling it in passing, perhaps it would need to be in kernel, but with an implementation like:

Process>>isUnderDebug
   ^Smalltalk at: #OupsDebuggerSystem ifAbsent: [false] ifPresent: [:debugger | debugger isProcessUnderDebug: self]

and the guts of the implementation living in that class method.

pharo-project / pharo

Ability to easily test if a particular process is being debugged #16087