pharo-project / pharo

Pharo is a dynamic reflective pure object-oriented language supporting live programming inspired by Smalltalk.
http://pharo.org
Other
1.21k stars 355 forks source link

Deadlock on Exception with ensure: #10136

Open demarey opened 3 years ago

demarey commented 3 years ago

Bug description Deadlock on Exception with ensure: It freezes because the UI process is waiting on the semaphore that is never signaled. However it should be signaled because it is in an *ensure block.

To Reproduce

s := Semaphore new.
[ [1/0] ensure: [s signal.] ] forkAt: Processor userInterruptPriority.
s wait.

WARNING: image freezes!

Version information:

Ducasse commented 3 years ago

Ouch!

dionisiydk commented 3 years ago

Nothing criminal here :) UI process waits on semaphore and Cmd+. recovers it. It opens two debugger as expected: from the interrupt and from the ZeroDivide error.

The ensure signal will only happen when the block will exit. But the error is not the condition for this. Smalltalk exceptions do not unwind the context immediately allowing such thing like retry, resume and other interesting things (debugger itself is part of it). So during ZeroDivide error the unhandledAction of the exception triggers the debugger. It schedules new UI action to open the debugger window. But because UI process is locked on semaphore nothing happens: no such action is processed.

It's not that easy to improve really. In this scenario the debugger needs to spawn new UI process. But it can lead to multiple UI processes. And the current environment will be by this.

guillep commented 3 years ago

Hi @dionisiydk,

Hot really.

Here, Cmd+. does not recover it because the UI process is suspended on the semaphore, and Cmd+. will only break running/ready processes (it's thought to stop infinite recursions and similar). However, being suspended on a semaphore does not count as running nor ready, and since that semaphore is never signaled, then the process never reaches that state.

Moreover, there is a more deep thing going on here. Reading the code, one would expect that the semaphore will be signaled. If people need to understand how the debugger works to make sense of this problem, then this is not right. Also, is this behaviour different in headless, from the command line?

If the debugger is making programs less reliable and more difficult to reason about locally, then this is not a good behavior. This is clearly a bug to me.

dionisiydk commented 3 years ago

Hi @guillep

Here, Cmd+. does not recover it because the UI process is suspended on the semaphore, and Cmd+. will only break running/ready processes (it's thought to stop infinite recursions and similar). However, being suspended on a semaphore does not count as running nor ready, and since that semaphore is never signaled, then the process never reaches that state.

This is strange. I can't reproduce it: latest image, Mac and cmd+. recovers the lock for me and shows two debuggers.

guillep commented 3 years ago

True, I happen to reproduce it in Pharo 9 only. @demarey @dionisiydk could you confirm?

dionisiydk commented 3 years ago

Moreover, there is a more deep thing going on here. Reading the code, one would expect that the semaphore will be signaled. If people need to understand how the debugger works to make sense of this problem, then this is not right.

It is not how debugger works. It is a semantics of exception handling. The following example shows it better:

ensureWasNotExecuted := true.
[ 
    [1/0] ensure: [ ensureWasNotExecuted := false ] 
] on: ZeroDivide do: [:err | 
    self assert: ensureWasNotExecuted 
]

And the opening of the debugger is similar to how this assertion is executed here.

demarey commented 3 years ago

@Guille I can confirm that the example is interruptible with cmd + . on Pharo 10. Also, I get a strange behaviour: after some executions + (Cmd+), I directly get the debugger on the error, no more lock.

demarey commented 3 years ago

It is not how debugger works. It is a semantics of exception handling. The following example shows it better:

ensureWasNotExecuted := true.
[ 
  [1/0] ensure: [ ensureWasNotExecuted := false ] 
] on: ZeroDivide do: [:err | 
  self assert: ensureWasNotExecuted 
]

What is the expected behaviour? I would expect the ensure block to be executed if an error is raised in [1/0]. Then, I would expect the error to be caught by the on:do:block.

guillep commented 3 years ago

Moreover, there is a more deep thing going on here. Reading the code, one would expect that the semaphore will be signaled. If people need to understand how the debugger works to make sense of this problem, then this is not right.

It is not how debugger works. It is a semantics of exception handling. The following example shows it better:

ensureWasNotExecuted := true.
[ 
  [1/0] ensure: [ ensureWasNotExecuted := false ] 
] on: ZeroDivide do: [:err | 
  self assert: ensureWasNotExecuted 
]

And the opening of the debugger is similar to how this assertion is executed here.

I'm sorry, you talk about semantics without explaining it. Do you mean that the ensure is executed after the assertion? If so, yes, it's like that and I know it's like that. The two following examples show it.

r := OrderedCollection new.
[ 
    [1/0] on: ZeroDivide do: [:err | 
        r add: #exceptionHandler
    ] 
] ensure: [
    r add: #ensure
].
r
r := OrderedCollection new.
[ 
    [1/0] ensure: [ r add: #ensure ] 
] on: ZeroDivide do: [:err | 
    r add: #exceptionHandler
].
r

Still, the debugger adds some strangeness to this. The problem is that when the debugger tries to open, it holds the execution and prevents the ensure: block to be executed. Do we agree on this?

See the following example. An exception (in this case a halt) is thrown within the context of an ensure. The debugger opens, but the inform has not been executed yet. Only when the user presses on continue/proceed the debugger closes and the ensure block is executed.

[ 
    1halt 
] ensure: [
    self inform: 'toto'
].

This, on the command line, works differently, because no debugger is open, so the exception reaches the top, ensure blocks are executed, as expected. Even worse, the behaviour is so different that the image just gets killed in that case on the command line xD. So you have to ensure to wrap the process with an exception handler.

$ vm --headless image eval "s := Semaphore new.
[ [[1/0] ensure: [s signal.]]
        on: Error do: [ ] ] forkAt: Processor userInterruptPriority.
Stdio stdout nextPutAll: 'Finished!'.
s wait.
"
NewUndeclaredWarning: UndefinedObject>>DoIt (s is Undeclared)
NewUndeclaredWarning: UndefinedObject>>DoIt (s is Undeclared)
NewUndeclaredWarning: UndefinedObject>>DoIt (s is Undeclared)
NewUndeclaredWarning: UndefinedObject>>DoIt (s is Undeclared)
NewUndeclaredWarning: UndefinedObject>>DoIt (s is Undeclared)
NewUndeclaredWarning: UndefinedObject>>DoIt (s is Undeclared)
Finished!a Semaphore()
guillep commented 3 years ago

And please note, I think from one perspective what is happenning is nice. You want the debugger in the context of the exception and before the handlers and ensures are executed (although there is something fishy) otherwise debugging might become impossible (e.g., if your file got closed or your memory cleaned up). But on the other side, when you don't expect a debugger, we have to acknowledge it may be strange...

Also, related to this, the fact that almost all exceptions are considered resumable only makes this more complicated ^^

dionisiydk commented 3 years ago

Hi @guillep

So all you said is correct. What we can do then?

I only see a general issues here about IDE behavior (not the exception semantics):

This, on the command line, works differently, because no debugger is open, so the exception reaches the top, ensure blocks are executed, as expected. Even worse, the behaviour is so different that the image just gets killed in that case on the command line xD. So you have to ensure to wrap the process with an exception handler.

It's all about a deployment strategy (probably we call it now a debugger request strategy). Currently headless mode = no debugger mode. And without debugger we can either kill the image on error or show the warning message. When the image is deployed as a final app both options can be used depending on business needs (assuming the wish to hide a real IDE/debugger). Then even with UI there can be a different behaviour for these examples. And in same time the headless mode can also provide potential "command line debugger". So it will be same as a normal headful image.

dionisiydk commented 3 years ago

Hi @demarey

What is the expected behaviour? I would expect the ensure block to be executed if an error is raised in [1/0]. Then, I would expect the error to be caught by the on:do:block.

Expected behavior is what is currently considering that we are in our language: ensure is after handler because during the handlers the failed block is still live. Smalltalk exceptions are different from other systems. It can confuse people sometimes but it brings a lot of power.

daniels220 commented 1 year ago

Agreed that the behavior is as-expected as regards ensure:. However there is a broader issue that trying to debug a background process seems to fail (deadlocks) when the UI process is waiting for something. I guess this is no longer actively developed, but with the old SeasideTesting framework, trying to place a breakpoint in server-side code will deadlock because the UI process is running the test and is waiting for an HTTP response.

I would propose, as a 95%, "good enough" solution, adding a check when opening the Debugger that spawns a new UI process if the current one is waiting on a Semaphore that is not part of a Delay (such as a socket, mutual-exclusion semaphore, etc). Essentially the notion is that if the UI process will predictably resume soon, we can let it do its thing, but if it is waiting on something in particular, there is a very good chance that that thing is exactly what we're trying to debug. See proposed implementation in #14598.

dionisiydk commented 1 year ago

@daniels220 thanks for getting the attention to it.

We have a general fix for the problem waiting for several year now: #11455.