Closed dram closed 2 years ago
Have a more close look at this issue, the saving process seems be stalled at SmalltalkImage>>#processShutDownList:
in SmalltalkImage>>#snapshot:andQuit:withExitCode:embedded:
.
And I found something suspicious, there is an extra process Delay>>wait
in TruffleSqueak with 6.0 beta image, which does not exist in OpenSmalltalk VM, i.e.:
BTW, this extra process also does not exist in TruffleSqueak with TruffleSqueak-22.1.0 image.
Thanks for the report and for doing some digging. It seems that EventTicklerProcess terminate
causes the issue. Not sure what has changed recently, but using the following allows me to save the image again:
EventSensor>>#shutDown
InterruptWatcherProcess ifNotNil: [
InterruptWatcherProcess terminate.
InterruptWatcherProcess := nil ].
EventTicklerProcess ifNotNil: [
"EventTicklerProcess terminate".
EventTicklerProcess := nil. ].
inputSemaphore ifNotNil:[Smalltalk unregisterExternalObject: inputSemaphore].
Any idea what could be going on here? /cc @marceltaeumel
Apparently, the semantics of Process>>#terminate
have changed a bit. Maybe @LinqLover has an idea what could be going on.
A few months ago, #terminate
has started to complete all ensure:
contexts on the receiver stack first, even those that were already active before the termination request. Could there be any ensure:
context like this on the stack of the event tickler process?
Also /cc @isCzech
Hi @LinqLover, all, it's been a year since the extended #terminate logic has been introduced :) Before digging deeper, could you please filein and check the latest #terminate version (attached) to see if anything changes? The trunk #terminate still contains some bugs which have been fixed in the latest enclosed version (expected to be merged into the release image).
And yes, the change in the #terminate logic only affects terminating processes suspended/terminated inside #ensure: unwind blocks. Kernel-jar.1447.followUp.cs.zip
Follow up: To be precise, the enclosed changeset Kernel-jar.1447.followUp.cs.zip is meant to be used along with the updated #suspend attached here: RevisedSuspend.2.cs.zip
The new release VM (bundled in the beta image) supports new #suspend primitive 578 fixing incorrect #suspend semantics of the old primitive 88. The attached RevisedSuspend.2.cs.zip enables the image to use the new primitive.
Alternatively, instead of filing-in these two changesets you can try Kernel-jar.1447 from the Inbox to see if anything changes.
Thanks for the details! Could you summarize how primitive 578 is different to primitive 88?
The change affects processes waiting on a Semaphore or Mutex; so far #suspend removed the waiting process from the semaphore or mutex and placed it in a run queue when resumed. That's a bug. Now, #suspend backs up the process's code before the wait so as a result the process returns to the same wait state when resumed.
Here's part of Eliot's comment for the revised #suspend: "...If the given process is not the active process, take it off its corresponding list. If the list was not its run queue assume it was on some condition variable (Semaphore, Mutex) and back up its pc to the send that invoked the wait state the process entered. Hence when the process resumes it will reenter the wait state. ..."
I'm not sure applying these changes will help as I don't understand the root cause. But it's a start :)
I'm not familiar with TruffleSqueak but I've downloaded TruffleSqueakImage-22.1.0, used the latest VM version VMMaker.oscog-mt.3184 and tried two experiments: 1) updated the image via Squeak -> Update Squeak menu to the latest... saving works OK; then filed-in the two changesets attached above... again saving the image works OK 2) only filed-in the two changesets above to verify #terminate and #suspend work correctly... and again saving the image works OK
What was your scenario when saving the image failed and the eventTickler process didn't terminate. Where is the difference from my scenarios?
Hi @isCzech , sorry that I should give more details in the issue description.
This issue can be reproduced using TruffleSqueak VM with recent Squeak 6.0 images, following are detailed steps under Windows:
graalvm-ce-java17-windows-amd64-22.1.0.zip
from https://github.com/graalvm/graalvm-ce-builds/releases and unpacktrufflesqueak-installable-svm-java17-windows-amd64-22.1.0.jar
from https://github.com/hpi-swa/trufflesqueak/releasesgraalvm-ce-java17-22.1.0\bin\gu –local-file install trufflesqueak-installable-svm-java17-windows-amd64-22.1.0.jar
Squeak6.0alpha-21736-64bit-202204190959-Windows-x64.zip
from https://files.squeak.org/ and unpack (latest image is not compatible with TruffleSqueak 22.1.0, so a slightly older one is used here)Squeak6.0alpha-21736-64bit.image
, Squeak6.0alpha-21736-64bit.changes
and SqueakV50.sources
into directory graalvm-ce-java17-22.1.0\languages\smalltalk\resources
graalvm-ce-java17-22.1.0\bin\trufflesqueak
I'll try your attached patches later.
Hi @isCzech ,
I have tried your two changesets, image saving works after applied!
But there is some other problem, for each time I do a saving, there will be an extra process created, following is a screenshot of Process Browser:
Thanks for checking, @dram! TruffleSqueak does not implement primitive 578 yet, so maybe that's why you get to see these additional processes. Let me try and push an implementation.
Hi @dram, I'm a bit confused now: you use Squeak6.0alpha-21736 and say later ones are not compatible. How did you test Squeak beta which is 21757 and later? About the incompatibility: Again, I'm not familiar with TruffleSqueak; I understand you don't use OpenSamlltalkVM but GraalVM - is it possible the GraalVM somehow doesn't work correctly with the latest Squeak images? I mean even the not so latest, causing incorrect behavior? Is there a version number that works and the next one doesn't? That would help :)))
Hi @fniephaus,
TruffleSqueak does not implement primitive 578 yet, so maybe that's why you get to see these additional processes. Let me try and push an implementation.
Worth a try but as long as #suspend uses primitive 88, things should still work as before... A question: You tried using Cuis image - does it work ok? The reason I'm asking is Cuis implemented the latest #terminate a few months ago, so if Cuis image works ok, that means #terminate should work ok and the problem might lie somewhere else. (i.e. the inability to terminate the eventTickler could be a consequence of something else).
I tried implementing primitive 578 (see https://github.com/hpi-swa/trufflesqueak/commit/b926e17dfa332ff24746f936bab0de6b25df02f2) but unfortunately, I still see stale processes when saving the image.
You tried using Cuis image - does it work ok? The reason I'm asking is Cuis implemented the latest #terminate a few months ago, so if Cuis image works ok, that means #terminate should work ok and the problem might lie somewhere else. (i.e. the inability to terminate the eventTickler could be a consequence of something else).
I just opened the Cuis6.0-5171.image
with a TruffleSqueak built from https://github.com/hpi-swa/trufflesqueak/commit/b926e17dfa332ff24746f936bab0de6b25df02f2 and saving works just fine, without any extra processes.
I tried implementing primitive 578 (see b926e17) but unfortunately, I still see stale processes when saving the image.
You tried using Cuis image - does it work ok? The reason I'm asking is Cuis implemented the latest #terminate a few months ago, so if Cuis image works ok, that means #terminate should work ok and the problem might lie somewhere else. (i.e. the inability to terminate the eventTickler could be a consequence of something else).
I just opened the
Cuis6.0-5171.image
with a TruffleSqueak built from b926e17 and saving works just fine, without any extra processes.
Well, I guess it means the extended #terminate logic itself is not the culprit... I'll try to follow @dram 's steps and see :)
Hi again, I can confirm I can reproduce the incorrect behavior (installed things as per steps 1 to 3, then for some reason step 4 didn't work at all - the image didn't start but I used TruffleSqueakImage 22.1.0 instead and updated the terminate method) It's strange that the same Truffle/Squeak image works ok with the OpenSmalltalk VM and behaves incorrectly with the GraalVM, while at the same time the Cuis image with the updated terminate works ok with both VMs!
I've noticed Cuis uses different priorities for the interrupt watcher and the low space watcher while Squeak runs all of them with the same priority, however, changing the priorities doesn't help. But there's another difference - for some reason Cuis has modified the EventSensor implementation, including the evenTickler, which may explain Cuis works flawlessly :) Maybe @jvuletich could shed some light here why (Hi Juan, I hope you don't mind :) )
Just in case you, @isCzech, are not aware: TruffleSqueak is an alternative Smalltalk VM implementation, so it's very likely that something on the VM level isn't working correctly. However, things used to work fine with older Squeak images and with Cuis images. I noticed that newly introduced ProcessTest
crash TruffleSqueak, so I expect there to be at least a VM-level problem. But there's probably also something wrong on the image level as Cuis has demonstrated.
Thanks indeed! I've realized you must have a different VM (not sure how different though and how the latest VM changes map into your VM); nonetheless you're absolutely right there must be something off on the image level if Cuis works fine either way :) (I haven't tested the new Process tests on an old image but yes, they may be very disruptive with the old image)
Hi @dram, I'm a bit confused now: you use Squeak6.0alpha-21736 and say later ones are not compatible. How did you test Squeak beta which is 21757 and later? About the incompatibility: Again, I'm not familiar with TruffleSqueak; I understand you don't use OpenSamlltalkVM but GraalVM - is it possible the GraalVM somehow doesn't work correctly with the latest Squeak images? I mean even the not so latest, causing incorrect behavior? Is there a version number that works and the next one doesn't? That would help :)))
Hi @isCzech, the incompatibility which I mentioned is about issue #156. It is fixed recently, but not released yet. I tested Squeak6.0beta-21772 with TruffleSqueak build from main
branch.
If you need to experiment with latest Squeak image, you can follow instructions in https://github.com/hpi-swa/trufflesqueak/blob/main/docs/development.md
Hi @dram, thanks, I understand now. In the meantime I've come to a tentative conclusion there may be something wrong with the Squeak image other than the new termination logic. Not sure what though; it's possible fixing #terminate bugs may have unmasked some other issues - something Cuis has fixed or never suffered from. See above observations; I hope Juan (@jvuletich) could enlighten us why he changed the EventSensor implementation recently - whether he was possibly fixing some bugs...? In which case Squeak could follow suit :)
Hi @isCzech, when looking around code of Process>>#terminate
, I find following line a bit suspicious:
suspendedContext runUntilErrorOrReturnFrom: suspendedContext
According to the doc of Context>>#runUntilErrorOrReturnFrom:
, the argument of it should be a sender of self, so I try to change this line to following, after that, image saving works without blocking:
suspendedContext runUntilErrorOrReturnFrom: ctxt
There is some similar code in your patched version, i.e.:
top runUntilReturnFrom: top
Similarly, after I change it to following, no more stale processes created after saving:
top runUntilReturnFrom: ctx
I'm not familiar with Squeak's process scheduling system, so not sure if such change is reasonable. Also I'm curious why there is no problem in OpenSmalltalk VM.
According to the doc of Context>>#runUntilErrorOrReturnFrom:, the argument of it should be a sender of self,
Yes, the description "ASSUMES aSender is a sender of self" always bothered me :) I probably used the method beyond it's original intended use; however, it works correctly even when the receiver and 'aSender' are identical.
so I try to change this line to following, after that, image saving works without blocking:
suspendedContext runUntilErrorOrReturnFrom: ctxt
yes, it will cover most cases but I'm attaching a test that will fail with your modification; it is a bug in the original #terminate. It fails to recognize the case when the unwind block is the top context (of the stack). #runUntilReturnFrom: is a trivialized (or less restricted) version of #runUntilErrorOrReturnFrom: designed as a helper method for #unwindTo:; it's purpose is to run a fragment of another context stack. ProcessTest-testTerminateEnsureAsStackTop.zip
However, file-in the attached fix - it works in my case. I've compared Cuis's code and found this difference - thanks @jvuletich !! Nice catch :) EventSensor-eventTickler.zip
I don't understand yet why the same problem won't show with the Squeak VM...
I don't understand yet why the same problem won't show with the Squeak VM...
I wonder if speed of the processing could have anything to do with it?
Anyway, if you let me know whether the fix works in all your scenarios, I'll send the fix to Squeak Inbox. Thanks
Hi @isCzech, I have a test of EventSensor-eventTickler patch, it seems that the stale process problem still exists.
Tests are taken in two environments:
Steps:
EventSensor-eventTickler.st
, Kernel-jar.1447.followUp.cs
and RevisedSuspend.2.cs
And regarding to ProcessTest-testTerminateEnsureAsStackTop, seems that it failed in TruffleSqueak with EventSensor-eventTickler.st
and Kernel-jar.1447.followUp.cs
filed in.
The assertion failure is caused by following line:
self assert: p1 isTerminated & p2 isTerminated & p3 isTerminated.
WRT #eventTickler in Cuis, it was last tweaked in December 2021 by Andrés Valloud. There is a possible weakness in Delay. If a Delay is waiting, but its process is terminated, and then the same delay is sent #wait without checking #beingWaitedOn, the system wil hang. See senders of #wait and #beingWaitedOn in Cuis. In any case, I don't know if this has any relation with the problems when saving Squeak in TruffleSqueak. I'm just answering Jaromir's question about #ventTicker in Cuis.
And regarding to ProcessTest-testTerminateEnsureAsStackTop, seems that it failed in TruffleSqueak with
EventSensor-eventTickler.st
andKernel-jar.1447.followUp.cs
filed in.The assertion failure is caused by following line:
self assert: p1 isTerminated & p2 isTerminated & p3 isTerminated.
yes, sorry, I haven't realized you don't have the full set of patches... It also depends on the combination of the VM+image :)
Hi @isCzech, I have a test of EventSensor-eventTickler patch, it seems that the stale process problem still exists.
Tests are taken in two environments:
- Debian Linux 11, TruffleSqueak (main branch), Squeak6.0beta-21772
- Windows 11, TruffleSqueak 22.1.0, Squeak6.0alpha-21736
Steps:
- launch the clean Squeak image
- file in
EventSensor-eventTickler.st
,Kernel-jar.1447.followUp.cs
andRevisedSuspend.2.cs
- open Process Browser
- Trigger saving from menu bar
Hi @dram, at this point it shouldn't matter whether you apply Kernel-jar.1447.followUp.cs and RevisedSuspend.2.cs because they have nothing to do with the issue. Could you please tell me whether the problem persists when you apply only the EventSensor-eventTickler.st patch? (with the recent Squeak image indeed) ?
Does it mean that the problem disappeared on Windows 10 and possibly some other scenarios? (I have no way to test Win11/Linux/Mac)
It seems, as Juan indicated, it may be a timing issue caused by the eventTickler implementation. I only guess the new terminate made the problem more visible in certain scenarios. It never occurred in Cuis/Squeak though and I may only hypothesize it could be because of the speed of the VM?? I noticed GraalVM is slower... Thanks!
WRT #eventTickler in Cuis, it was last tweaked in December 2021 by Andrés Valloud. There is a possible weakness in Delay. If a Delay is waiting, but its process is terminated, and then the same delay is sent #wait without checking #beingWaitedOn, the system wil hang. See senders of #wait and #beingWaitedOn in Cuis. In any case, I don't know if this has any relation with the problems when saving Squeak in TruffleSqueak. I'm just answering Jaromir's question about #ventTicker in Cuis.
Thank you @jvuletich for your thoughts. I noticed you refactored EventSensor in December and Andres tweaked the eventTickler and I wondered whether you discovered some problem/bug you were fixing or just improving things proactively :)
Here is for sure an interplay between the eventTickler and termination and I guess it's exactly what you say but I didn't go as deep as to fully understand it yet :) Thanks again!
Hi @dram,
I have a test of EventSensor-eventTickler patch, it seems that the stale process problem still exists.
Oops, apologies, I forgot to replace the line top runUntilReturnFrom: ctx
you suggested while testing the eventTickler "fix" so no, the eventTickler "fix" isn't a fix :) I'll keep looking...
Hi @isCzech,
After some digging, I think that the bug revealed by ProcessTest-testTerminateEnsureAsStackTop
is not related to runUntilErrorOrReturnFrom:
, as this test also failed in clean Squeak beta image running under Open Smalltalk VM, I'm using Squeak6.0beta-21829-64bit-202205110711-Windows-x64 to run this test.
And while experimenting, I find that test can be fixed by following change, not sure if it is reasonable:
Based on ProcessTest-testTerminateEnsureAsStackTop
, I constructed a simplified test, which reveals something interesting regarding to TruffleSqueak.
Printing following code in Workspace, in Squeak and Cuis running under OpenSmalltalk VM, both return reached here
.
p := [[] ensure: []] newProcess.
[p suspendedContext isUnwindContext] whileFalse: [p step].
p terminate.
'reached here'
Printing previous code in Squeak6.0alpha-21736 under TruffleSqueak 22.1.0, a nil
is returned, with following message in TruffleSqueak console:
[trufflesqueak] Running Squeak6.0alpha-21736-64bit.image on GraalVM CE (latency mode)...
[trufflesqueak] Image loaded in 1890ms.
[trufflesqueak] Unwind error: sender of CTX [] in [] in BlockClosure>>newProcess @756b58a7 is nil, unwinding towards CTX Compiler>>evaluateCue:ifFail: @2cc04358 with return value: nil
Printing previous code in Cuis6.0-5171 under TruffleSqueak 22.1.0, VM will be crashed, with following error:
[trufflesqueak] Running Cuis6.0-5171.image on GraalVM CE (latency mode)...
[trufflesqueak] Image loaded in 453ms.
[trufflesqueak] Unwind error: sender of CTX [] in Unknown behavior>>newProcess @76f856a8 is nil, unwinding towards CTX Unknown behavior>>evaluateMethod:to:logged:profiled: @7c853486 with return value: nil
ERROR: java.lang.ArrayIndexOutOfBoundsException: Index -178 out of bounds for length 89
org.graalvm.polyglot.PolyglotException: java.lang.ArrayIndexOutOfBoundsException: Index -178 out of bounds for length 89
at de.hpi.swa.trufflesqueak.nodes.ExecuteBytecodeNode.fetchNextBytecodeNode(ExecuteBytecodeNode.java:164)
at de.hpi.swa.trufflesqueak.nodes.ExecuteBytecodeNode.interpretBytecode(ExecuteBytecodeNode.java:101)
at de.hpi.swa.trufflesqueak.nodes.ExecuteBytecodeNode.execute(ExecuteBytecodeNode.java:79)
at de.hpi.swa.trufflesqueak.nodes.ResumeContextRootNode.execute(ResumeContextRootNode.java:44)
at <smalltalk> CTX Unknown behavior>>evaluateMethod:to:logged:profiled: @7c853486(Unknown)
at org.graalvm.sdk/org.graalvm.polyglot.Value.execute(Value.java:839)
at de.hpi.swa.trufflesqueak.launcher.TruffleSqueakLauncher.execute(TruffleSqueakLauncher.java:131)
at de.hpi.swa.trufflesqueak.launcher.TruffleSqueakLauncher.launch(TruffleSqueakLauncher.java:82)
at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:296)
at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:121)
at org.graalvm.launcher.AbstractLanguageLauncher.runLauncher(AbstractLanguageLauncher.java:168)
Hi @dram, the test ProcessTest-testTerminateEnsureAsStackTop requires the new #terminate method, however, the Kernel-jar.1447.followUp.cs version of #terminate is a simplified version requiring the latest OpenSmalltalk VM implementing the new #suspend primitive 578 - this setup should be the default in the new release. Unfortunately I gave you this simplified version before I realized GraalVM doesn't have the new primitive 578 - so the test will fail. That's the point - to show the current #terminate has this bug :) I don't think it has anything to do with the incorrect behavior we're observing here (I hope) and I may have an inkling where the problem is, just bear with me :) It's so darn complicated :D (You should be able to verify the test works if you filein Kernel-jar.1447.followUp.cs and RevisedSuspend.2.cs into the beta image running on the lastest OpenSmalltalk VM; it will FAIL with the TruffleSqueak VM though - that's expected).
If you wanted to play more, try the following version of #terminate that was supposed to work even with the older VM (no guarantee it will in you specific scenario though): Kernel-jar.1447-Terminate.1.cs.zip I hope all this mess will sort out with the new Squeak release. The good news is Cuis works with the latest #terminate and and TruffleSqueak, so the method and the test are ok. There must be an issue with Squeak's implementation (there's a difference in Squeak's Semaphore>>#critical: implementation which is my guess why Cuis works and I'll have to adjust the termination code in Squeak. Thanks for digging :)
Hi @isCzech, according to the test case I mentioned in previous comment https://github.com/hpi-swa/trufflesqueak/issues/159#issuecomment-1138547121, Cuis's #terminate
is also broken with TruffleSqueak, it will crash the VM.
Hi @isCzech, according to the test case I mentioned in previous comment #159 (comment), Cuis's
#terminate
is also broken with TruffleSqueak, it will crash the VM.
True, this [trufflesqueak] Unwind error: ...
in both cases is weird but I can't say what it means; I guess it's related to the TruffleSqueak VM which I know nothing about :)
Hi @dram, I have some preliminary results:
1) I've fixed the test ProcessTest-testTerminateEnsureAsStackTop
, you had a good hunch something was wrong - it was supposed to work with the change you suggested (top runUntilReturnFrom: ctx
); check this:
ProcessTest-testTerminateEnsureAsStackTop-fixed.zip
2) The change you suggested (top runUntilReturnFrom: ctx
) should not change the semantics of termination (but I haven't tested it as thoroughly as the current version and there might be some border situations, e.g. when the unwind context is at the bottom of the stack); what it does is it just finishes executing the unwind context and if you look at the code, finishing it should be harmless; however - it is not very logical to finish what is not supposed to be finished (the process has been terminated and only unwind blocks are supposed to be executed). What's most interesting is it fixes your problem with TruffleSqueak :) BUT I still think there's something off with TruffleSqueak VM because the new terminate works on both Squeak and Cuis (actually it runs on Pharo as well!) so I'm inclined to believe it's TruffleSqueak VM's issue (I guess your results above also point in that direction if I understand correctly. And nice example "reached here", thanks!).
3) For info: Kernel-jar.1447.followUp.cs and RevisedSuspend.2.cs are supposed to work with the latest VM with #suspend primitive 578 (without it #isTerminated needs fixing) and Kernel-jar.1447-Terminate.1.cs (enclosed above) should work with the older VMs and with TruffleSqueak's (In both cases there may be some tests that need fixing, e.g. testAtomicSuspend)
4) I'd be very interested if you find out more - whether it's a VM problem and what's wrong; I traced the computation and can't find anything wrong. It's really surprising that the small change you suggested causing the unwind context finish executing fixes the issue for you! Maybe it has something to do with the fact that the top of the stack in this particular case is just a primitive call. But I have no experience with that :)
Thanks for letting me know. best, Jaromir
Hi @isCzech,
I kind of think that the new #terminate
may rely on some "tricky" feature in OpenSmalltak VM, which is not supported in TruffleSqueak. So there are two approaches to fix this problem:
#terminate
to not rely on that featureAnyway, the first thing to do would be determine what that feature is. The error message and exception stack trace mentioned in https://github.com/hpi-swa/trufflesqueak/issues/159#issuecomment-1138547121 may give some hints.
Hi @dram, why would you think it may be a "tricky" feature in OpenSmalltak VM and not in the TruffleSqueak VM? ;)
One of the differences between the old and new terminate is the old terminate uses the simulation machinery (the whole unwind is simulated using self popTo: suspendedContext bottomContext
). The new terminate on the other hand replaced all simulation with a "regular", "no simulation, no exceptions" code. One of the advantages is you can more easily debug (debugging a simulation is tricky).
I've compared the termination step by step in both the failing case and the "fixed" case with top runUntilReturnFrom: ctx
line replaced: The simulation (debugging) in both cases is identical and in both cases leads to correct termination of the tickler process. Debugging the same thing with the old version of terminate crashed the VM but that's expected with regard to the note above (debugging a simulation...).
Now, from that follows, I hope, that if the simulation (debugger) correctly terminates the event tickler process but the live run fails, the difference is most likely within the TruffleSqueak VM. So I'd say #terminate in TruffleSqueak relies on being run via the simulation machinery (#popTo & company). Once the TruffleSqueak VM trie to terminate using "regular" code, it fails, while at the same time when running the same "regular" code as a simulation (in debugger) it works correctly.
(It's a mystery why changing the one line "fixes" the issue for TruffleSqueak but as discussed in https://github.com/hpi-swa/trufflesqueak/issues/159#issuecomment-1138933169. I say it's not really a fix - it's rather a reminder something doesn't work as expected :) )
What's your thought?
Hi @fniephaus, does the exception stack trace mentioned in https://github.com/hpi-swa/trufflesqueak/issues/159#issuecomment-1138547121 tell anything to you? Would you agree with the conclusion above? Thanks!
Hi @dram and @isCzech, Thanks a lot for digging into this so much. Yes, I agree: it very much looks like there is at least one bug in TruffleSqueak somewhere. However, that "somewhere" could be relatively hard to find because scheduling, in general, is rather complex and has some interesting corner cases.
What would be very helpful are very short doIts that demonstrate problems in TruffleSqueak or a difference in OSVM vs TruffleSqueak behavior. Something like the doIt in https://github.com/hpi-swa/trufflesqueak/issues/159#issuecomment-1138547121 and some of the ProcessTests
are already rather long. A lot of change sets have been posted in this thread and I almost lost track: how do I obtain a Squeak/Smalltalk image that has all the "right" methods and tests that can be expected to land in the next Squeak release?
A lot of change sets have been posted in this thread and I almost lost track: how do I obtain a Squeak/Smalltalk image that has all the "right" methods and tests that can be expected to land in the next Squeak release?
The crucial point that makes the difference for TruffleSqueak image is the replacement @dram discovered in this comment. The suggested change will make TruffleSqueak work but I can't see any logical explanation why - IMO the answer lies within the TruffelSqueak VM. For OpenSmalltalk VM @dram's change has no effect and the original version (not working in TruffleSqueak) is more logical/consistent.
As for the expected release image, I propose including methods in Kernel-jar.1447.followUp.cs and RevisedSuspend.2.cs (attached previously) - but these two will require the latest VM with the 578 suspend primitive. The final decision should be made soon (two weeks timeframe?)
Just to make sure: the issue doesn't appear to be a timing or other issue with the event tickler.
Just to make sure: the issue doesn't appear to be a timing or other issue with the event tickler.
Maybe it's too soon to rule this out... Cuis image with the new terminate seems to work fine with TruffleSqueak VM in terms of saving the image (and terminating the even tickler) but @dram's observation here shows there's an issue with Cuis as well.
I have a hunch what's going on: Process>>#terminate
was changed and the actual termination logic was moved into an ensure block:
[] ensure: [
self suspend.
context := suspendedContext ifNil: [^self].
suspendedContext :=
[context releaseCriticalSection; unwindTo: nil. self suspend] asContext.
self priority: Processor activePriority + 1; resume]
It seems that the ensure block is never executed as part of the top-level unwinding logic: https://github.com/hpi-swa/trufflesqueak/blob/a068eca13ca09ac1de50bc4572dd53333fab8718/src/de.hpi.swa.trufflesqueak/src/de/hpi/swa/trufflesqueak/nodes/context/UnwindContextChainNode.java#L44-L57
That, at least, explains why all three processes of ProcessTest>>#testTerminateEnsureAsStackTop
are never terminated. I'm trying to work out a fix, but this part of the VM is everything but trivial.
Interesting! I wish you were right :) Cuis use the same approach; they just use a slightly different form: they use a method to get the receiver evaluated as an unwind block:
valueEnsured
[] ensure: self
Applying it to the Squeak's #terminate at TruffleSqueak makes no difference though...
Hi @isCzech, while rethinking about runUntilErrorOrReturnFrom:
, I realized that if the whole point of using it instead of value
is to deal with non-local return, why not do something like following:
unwindBlock hasMethodReturn ifTrue: [
... runUntilErrorOrReturnFrom: ...
] ifFalse: [
unwindBlock value
]
After this change, image saving in TruffleSqueak works with no problem.
Hi @dram, nice try ;)
evaluate this though:
[[] valueWithExit] hasMethodReturn
Yes, the whole point of using #runUntilReturn is to deal with non-local returns. They may be nested however. But it's a great experiment, thanks!
Hi @isCzech, thanks for pointing to valueWithExit
, really a complex story. :)
Anyway, following is another try:
Kernel-jar.1447.followUp.cs
and RevisedSuspend.2.cs
into Squeak6.0beta-21829-64bit.image
, and running under Open Smalltalk VMContext>>#unwindTo:safely:
[ctx isNil] whileFalse: [
(ctx tempAt: 2) ifNil: [
ctx tempAt: 2 put: true.
- top := (ctx tempAt: 1) asContextWithSender: ctx. "see the note below"
- top runUntilReturnFrom: top].
+ [(ctx tempAt: 1) value] on: BlockCannotReturn do: []].
ctx := ctx findNextUnwindContextUpTo: aContext]
KernelTests-Processes
groupAll tests passed, so I'm a bit curious that why not use [... value] on: BlockCannotReturn do: []
instead of runUntilReturnFrom:
? In my opinion semantics of these two are similar, and the former is more simple and straightforward.
Hi @dram, this one's even better but nope, try this:
[ [[Processor activeProcess terminate] ensure: [1/0]] on: ZeroDivide do: [] ] fork
You have to run unwind blocks on the original stack to make sure you won't miss exception handlers.
But you've discovered a missing test situation - thanks!
I just remembered that lots of Process
-related tests from Cuis fail or don't even terminate on TruffleSqueak:
https://github.com/hpi-swa/trufflesqueak/blob/b34dc080fc7b81b559faffe6e5df0798ece97606/src/de.hpi.swa.trufflesqueak.test/src/de/hpi/swa/trufflesqueak/test/runCuisTests.st#L12-L28
So I guess there are plenty of tests that could potentially help find the bug(s) in TruffleSqueak.
These are the tests added to complement the new (fixed/extended) termination logic introduced last year. They should work with newer Squeak or Cuis images...
In TruffleSqueak, Squeak 6.0 beta image can not be saved, no error is displayed, neither in Squeak window, nor in the console.