Closed guillep closed 1 year ago
Broken since at least Pharo 2.0? Works in Squeak 5.2
Another way to reproduce it. newProcess
will create a process with a bottom context doing [self value. Processor activeProcess]
. So in the following
p := [ 1+2 ] newProcess.
p step.
p isTerminated
The stepping operation will just push self
to the stack, we are before the message send of value
, but still the process says it is terminated.
@guillep with @Larcheveque we checked a little bit the issue, and we found a strange comparison in the #isTerminated
method. There is many comparison checking if the suspended context is dead, and the final one checks if the suspended context's startpc
is smaller than the it's pc
. This looks weird, because it should be almost always the case, unless the process as not started.
isTerminated
self isActiveProcess ifTrue: [ ^ false ].
^ suspendedContext isNil or: [
"If the suspendedContext is the bottomContext it is the block in Process>>newProcess.
If so, and the pc is greater than the startpc, the block has already sent and returned
from value and there is nothing more to do."
suspendedContext isBottomContext and: [
suspendedContext isDead
"The pc of the suspendedContext is set to nil in #terminate explicitly.
Leaving this line in for safety."
or: [suspendedContext pc > suspendedContext startpc ] ] ] "HERE"
We tried to remove it, but of course, it breaks many tests because the termination of processes seems not effective and so many processes are still considered alived. We really don't know which correction to apply, because we don't know exactly why this last comparison have been added and is doing the job in the common case.
Last week @Marmat21 and @dupriezt were looking into it and had more insights too. Maybe they can share them here?
To limit bug bankruptcy (see https://www.joelonsoftware.com/2012/07/09/software-inventory/) this issue has been automatically marked as stale because it has not had any activity in 6 months. It will be closed in 1 month if no further activity occurs. If this issue remains important to you, please comment to reactivate the issue. Thank you for your contributions.
Joel on SoftwareImagine, for a moment, that you came upon a bread factory for the first time. At first it just looks like a jumble of incomprehensible machinery with a few people buzzing around. As your eyes adjus…
This should fix the problem. Indeed the condition was weird because it was all wrong and made almost any bottom context terminated. All Process and Extended tests are passing (also fixed comment):
isTerminated
self isActiveProcess ifTrue: [ ^ false ].
^ suspendedContext isNil or: [
"If the suspendedContext is the bottomContext it is the block in BlockClosure>>#newProcess.
If so, and the pc is greater than the startpc, the block has already sent and returned
from value and there is nothing more to do."
suspendedContext isBottomContext and: [
suspendedContext isDead or: [
"The pc of the suspendedContext is set to nil in #terminate explicitly.
Leaving this line in for safety."
suspendedContext closure
ifNil: [suspendedContext methodClass == Process
and: [suspendedContext selector == #doTerminationFromYourself]]
ifNotNil: [suspendedContext pc >= suspendedContext endPC]
] ] ]
The fix is part of my PR #8567 (I did not see this issue originally)
The code examples above are now all working
So I guess we can close this issue?
Yes, this issue has been fixed a while ago and can be closed. Reminder: There's, however, another PR #10644 pending approval, fixing some remaining termination issues.
Thanks. Yes we know about the issue and it is so central that we are always dropping it.
Thanks, Stephane, for your note; I understand your concern, indeed. I can only suggest the same code was implemented in both Squeak and Cuis more than half a year ago and no issues have surfaced since. Although Pharo is 99% similar in this particular area I of course understand it doesn't guarantee something won't pop up. Let me know if and how I can help.
Thanks for your nice words and understanding. Sometimes our execution paths can differ. For example this is really the case on ephemerons because we use a lot of announcements so they stressed the system different and we found and fix many bugs there. So we really appreciate your efforts and feel bad that we did not react better. We are just always running after the next urgent things to be fixed. :(
I can't even begin to imagine how much effort it is to maintain and advance a system like Pharo and how busy you all must be :) From my limited viewpoint, I'm just not happy I left #terminate in its current state with known shortcomings, all of which are addressed in the PR. Thanks again.
Last note: my fear is once some code start "using" those shortcomings it'll be more and more difficult to fix them.
A process that is on the bottom context, but whose context has already started but not terminated says that the process is terminated.
The following code (easily translated to a test) shows the problem. We start a process from a manually made bottom context. That context will stop in a semaphore. While that context is suspended there, the process says it is terminated, and this prevents us from resuming it (because resuming asks if the process is terminated).
However, the process is clearly not terminated! We have some code after the semaphore that should be executed.