Image crash or errors when openning another developper Pharo 11 image

ELePors commented 3 months ago

When i open a Pharo 11 image (Pharo-11.0.0+build.726 64bits) + Bloc + Alexandrie copied from another developper PC on my PC i have an image crash due to SDL in the FFI par of the code... in the traces i find the directory of the SDL library of the other developper on its PC... when SDL want to start to call the library it uses this "cached" directory and if it does not exists -> crash

a workaround solution is to recompile all SDL package with a healdess start of Pharo with a save...

PharoConsole.exe --headless "Pharo11-64.image" eval --save "SDL2 compileAll"

I have the same problem with LibGit also in iceberg... i do the following workaround :

pkg := RPackageOrganizer default packageNamed: 'LibGit-Core' ifAbsent: [nil].
pkg methods do: [ :e | e recompile ]

FFI should not crash so i post the issue to avoid these workarounds ...

Cheers Eric.

hernanmd commented 3 months ago

Thanks for sharing the workarounds!

Ducasse commented 2 months ago

Thanks for the report.

ELePors commented 2 months ago

The problem is that FFI memorizes the path of the library inside the compileMethod... the CompiledMethod instance of the ffiCall (for example SDL2 class>>modState) contains in its "literal2" a TFExternalFunction containing in moduleName a string pointing to the library path... maybe we should create a mecanism to cleanup all references to moduleName ?

Ducasse commented 2 months ago

Hi Eric Pablo will have a look.

tesonep commented 2 months ago

Yes! I am checking it

tesonep commented 2 months ago

Hi @ELePors, do you have any insight of what to do to reproduce it. I am trying but maybe you have any clue that might help

ELePors commented 2 months ago

Hi !

A simple way to reproduce it… you open a Pharo image with Bloc samples and examples… you close them and save the image…

Then, you copy the image on another PC on which Pharo is not located in the same directories (on the firsts it is on My Documents\Pharo\vm on the second in D:\Pharo\vms)

And just try to open the image…

Eric

De : Pablo Tesone @.> Envoyé : jeudi 5 septembre 2024 11:39 À : pharo-project/pharo @.> Cc : LE PORS Eric @.>; Mention @.> Objet : Re: [pharo-project/pharo] Image crash or errors when openning another developper Pharo 11 image (Issue #17029)

Hi @ELePors https://github.com/ELePors , do you have any insight of what to do to reproduce it. I am trying but maybe you have any clue that might help

— Reply to this email directly, view it on GitHub https://github.com/pharo-project/pharo/issues/17029#issuecomment-2331063530 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AI67RWOH42QBXGDQ2JNLSE3ZVARDBAVCNFSM6AAAAABNFT3DZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRGA3DGNJTGA . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AI67RWINPI7IIB4WI75PMXLZVARDBA5CNFSM6AAAAABNFT3DZWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUK6E2OU.gif Message ID: @. @.> >

tesonep commented 2 months ago

Hi Eric, with Guille we have an idea of something that might work. As we cannot reproduce it frequently enough to be sure that is working, would you mind testing it. The idea is to patch FFICalloutAPI>>#function:library: with:

New code:

function: functionSignature library: moduleNameOrLibrary
    | sender ffiMethod ffiMethodSelector |
    sender := self senderContext.
    ffiMethodSelector := self uFFIEnterMethodSelector.    "Build new method"
    ffiMethod := self newBuilder
        build: [ :builder |
            builder
                signature: functionSignature;
                sender: sender;
                fixedArgumentCount: fixedArgumentCount;
                library: moduleNameOrLibrary ].
    ffiMethod
        selector: sender selector;
        methodClass: sender methodClass.    "Replace with generated ffi method, but save old one for future use"
    ffiMethod
        propertyAt: #ffiNonCompiledMethod
        put: sender method.    "For senders search, one need to keep the selector in the properties"
    ffiMethod propertyAt: #ffiMethodSelector put: ffiMethodSelector.
    sender methodClass methodDict at: sender selector put: ffiMethod.    "Register current method as compiled for ffi"
    FFIMethodRegistry uniqueInstance registerMethod: ffiMethod.    "Resend"
    sender
        return: (sender receiver withArgs: sender arguments executeMethod: ffiMethod).
    ^ self

ELePors commented 2 months ago

Nope i have still a error message... in loadSymbol:module:

The stack :

TFFIBackend>>primLoadSymbol:module:
TFFIBackend>>loadSymbol:module:
ExternalAddress class>>loadSymbol:module:
TFExternalFunction>>validate
TFSameThreadRunner>>invokeFunction:withArguments:
AeCairoImageSurface class(AeCairoSurface class)>>externallyFree:
AeCairoImageSurface class(AeCairoSurface class)>>finalizeResourceData:
FFIExternalResourceExecutor>>finalize
[ 
        anEphemeron value finalize ] in FinalizationRegistry>>finalizeEphemeron: in Block: [ ...
FullBlockClosure(BlockClosure)>>on:do:
[ Processor terminateRealActive ] in [ :ex |
              | onDoCtx handler bottom thisCtx |
              onDoCtx := thisContext.
              thisCtx := onDoCtx home.

              "find the context on stack for which this method's is sender"

              [ onDoCtx sender == thisCtx ] whileFalse: [
                  onDoCtx := onDoCtx sender.
                  onDoCtx ifNil: [ "Can't find our home context. seems like we're already forked
                and handling another exception in new thread. In this case, just pass it through handler."
                      ^ handlerAction cull: ex ] ].

              bottom := [ Processor terminateRealActive ] asContext.
              onDoCtx privSender: bottom.

              handler := [ handlerAction cull: ex ] asContext.
              handler privSender: thisContext sender.

              (Process forContext: handler priority: Processor activePriority) resume.

              "cut the stack of current process"
              thisContext privSender: thisCtx.
              nil ] in FullBlockClosure(BlockClosure)>>on:fork: in Block: [ Processor terminateRealActive ]

module = "P:\PRG\Pharo\images\Pharo 12.0 - tests\120-x64\libcairo-2.dll" moduleSymbol = # cairo_surface_destroy

i've just tried to start the image with my Pharol Launcher and the 120-x64 does not exist anymore in Pharo 12.0 - tests folder but in the vms image of Pharo.

Eric.

tesonep commented 2 months ago

I am creating two PRs for the concurrency issue that leaves FFI methods around (#17118 and #17117). I think it will not fix all the problems. Also, I have seen that there are some issues in the reinitialization of Bloc when opening back windows. These errors produce some segmentation faults and keeping locks. So, the PRs will solve only a little little part...

pharo-project / pharo

Image crash or errors when openning another developper Pharo 11 image #17029