jgehrcke / gipc

gevent-cooperative child processes and inter-process communication
https://gehrcke.de/gipc
MIT License
83 stars 13 forks source link

Support CPython 3.8 on macOS (after change from from fork to spawn) #110

Closed TylerWilley closed 3 years ago

TylerWilley commented 3 years ago

So for fixing unit tests:

Other options:

I considered rewriting gipc to work with spawn instead of fork, however this is significantly more complicated. As our application relies on globals being copied to the forked process, I did not do this. It seems like it could be a dangerous / breaking change. A benefit to spawn is that there is no gevent hub or state to clean up, as it is not copied from the old process.

A possible improvement would be to update gipc to support both fork and spawn and let the user decide.

TylerWilley commented 3 years ago

I should note that I monkey patched the unit tests now.

During the tests the fork used by multiprocess.Process was the builtin, which does not automatically execute reinit.

I have not investigated why the patched fork in gevent works, but the builtin plus a manual reinit in _child does not.

Patching causes the tests to use gevent's fork, which does work properly

TylerWilley commented 3 years ago

I guess as a side effect, re-init allows copied greenlets to continue executing... so I'll see if I can add a unit test for that behavior and revert it.

Edit: This has been fixed in most recent commit. It seems that gevent for whatever reason, will not raise the LoopExit on the destroy'ing thread anymore. So I simply destroy the loop directly, which prevents anything from executing... Then set the old hub to None, which forces the creation of a new hub and loop. This doesn't seem to have any worse side effects than the previous dirty hub destroy as far as I can tell. It also removes the need to have the tests run while monkey patched.

jgehrcke commented 3 years ago

@TylerWilley thank you for all this energy investment here!

(I so much love to see this contribution! :heart:)

I am super positive about all this; and see that you found out and understand quite a lot already. I have very little bandwidth right now and just start asking some questions and raise some points, some of them might appear as ignorant but it's better than waiting N more days.. so please bear with me.

Python3.8 changed the default method for popen from fork to spawn

Only on macOS, right?

I considered rewriting gipc to work with spawn instead of fork, however this is significantly more complicated.

Thanks for sharing, yeah, this is also why back then I simply created https://github.com/jgehrcke/gipc/issues/100 and left it at that.

So I simply destroy the loop directly

:100: -- what we really want to make sure is that after all we call libev's ev_loop_destroy in the child to guarantee (that is how I always understood it) that greenlets inherited from the parent will not run anymore.

If you have not seen it, https://metacpan.org/pod/distribution/EV/libev/ev.pod#ev_loop_fork-(loop) also has a nice piece of documentation.

Next up, I will go through code changes and ask some maybe stupid / ignorant questions from the top of my head : )

jgehrcke commented 3 years ago

For Linux (the primary platform that gipc is targeted at) the key change here actually seems to be to use hub.loop.destroy() instead of hub.destroy(destroy_loop=True). As I described before, there is a crash somewhere in hub.destroy(destroy_loop=True) when running gipc's tests on Linux.

The relevant code section in gevent describes via code comment (thanks @jamadden) for why there is value to use hub.destroy(destroy_loop=True) instead of hub.loop.destroy() directly:

        # Let the frame be cleaned up by causing the run() function to
        # exit. This is the only way to guarantee that the hub itself
        # and the main greenlet, if this was a secondary thread, get
        # cleaned up. Otherwise there are likely to be reference
        # cycles still around. We MUST do this before we destroy the
        # loop; if we destroy the loop and then switch into the hub,
        # things will go VERY, VERY wrong (because we will have destroyed
        # the C datastructures in the middle of the C function that's
        # using them; the best we can hope for is a segfault).
        try:
            self.throw(HubDestroyed(destroy_loop))
        except LoopExit:
            # Expected.
            pass
        except GreenletError:
            # Must be coming from a different thread.
            # Note that python stack frames are likely to leak
            # in this case.
            pass

Now, if we really really understand what is going on, we may want to deviate from this recommendation and resort to using hub.loop.destroy() directly again -- this seemingly unbreaks the tests on Linux.

jgehrcke commented 3 years ago

Oh and just for your own curiosity: https://github.com/jgehrcke/gipc/issues/52

jgehrcke commented 3 years ago

I have released gipc 1.2.0: https://github.com/jgehrcke/gipc/blob/master/CHANGELOG.rst#version-120-jun-3-2021 -- can you maybe base your work on the current state of the master branch, and address this from the perspective of re-enabling compatibility between gipc >= 1.2.0, gevent >=21.x, and CPython >= 3.8 on macOS (platform support matrices are so much fun! : ))?

TylerWilley commented 3 years ago

So, we have multiple development environments, as this runs from a core library that multiple applications use.

One of the benefits to switching to spawn is none of this matters anymore and it's always clean, but as I mentioned, that probably won't work for us :/ (well, it's possible just significant refactoring and testing for the behavior change)

TylerWilley commented 3 years ago

So new wrinkle, when testing spawn() on osx, file descriptors aren't passed because

In order to get a descriptor into that list, you need to call DupFd between these lines

ForkingPickler does the serialization of the process

Now... short of subclassing and injecting a new Popen method, which I suppose we could do, the easiest way to do this is to register _GIPCReader / _GIPCWriter / etc classes with multiprocessing.reduction as done here

An alternative to that would be to switch from os.pipe() to multiprocessing.Pipe(), which is a larger change to gipc... but multiprocessing.Pipe() is already registered with reduction by default.

jgehrcke commented 3 years ago

Thanks @TylerWilley -- so much! Will get back to this soon; maybe still today, potentially next week. :rocket: Genuinely curious: what kind of product do you work on that uses gipc? Can also email me at jgehrcke@googlemail.com if that's better :). Learning of use cases is kind of the deepest satisfaction I can get from all this :).

TylerWilley commented 3 years ago

@jgehrcke Did you have a preference on any of my questions on the PR thus far?

specifically, using reduction as an import to handle passing the file descriptors for the pipes to the subprocess when spawn is used

I added a windows check so this should work for:

Thx :)

jgehrcke commented 3 years ago

Sorry that it took a while to get back to you! I wanted to manually run CI on this here but GH actions is just a little bit complex and challenging. I think I changed things so that when you add a commit here next time that GH will actually run CI (maybe after I manually approve). Let's try that.

Thanks for all the hard work here. I love how the changes after all got simpler and simpler.

Can you also do a clean rebase against current master and squash commits a bit?

TylerWilley commented 3 years ago

I clean rebased and force-pushed... seems that the workflow isn't running automatically though. No rush though :)

jgehrcke commented 3 years ago

Thanks for the update! I approved the CI run. Next time I hope it runs automatically.

I am sure you can discover this for yourself, but it looks like you still need to make the linter happy : )

+ flake8 gipc/
gipc/gipc.py:1079:5: E303 too many blank lines (2)
gipc/gipc.py:1084:5: E303 too many blank lines (2)
gipc/gipc.py:1087:5: E303 too many blank lines (2)
gipc/gipc.py:1092:5: E303 too many blank lines (2)
gipc/gipc.py:1097:5: E303 too many blank lines (2)
Error: Process completed with exit code 1.

(https://github.com/jgehrcke/gipc/pull/110/checks?check_run_id=2846345188#step:8:19)

TylerWilley commented 3 years ago
>           s = gevent.signal(signal.SIGTERM, signals_test_sigterm_handler)
E           TypeError: 'module' object is not callable

ah, that's probably why...

jgehrcke commented 3 years ago

Yo! Thanks, for real. Glad we got this over the finish line. Will look into making a release in the coming days!

Thanks for the nice collaboration.

TylerWilley commented 3 years ago

No problem! It was great working with you, and thanks for all the help getting it perfected :)

jgehrcke commented 3 years ago

This is now released as https://pypi.org/project/gipc/1.3.0/ :)

TylerWilley commented 3 years ago

That's great! Thank you, I will update our dependencies ^_^

On Fri, Jul 23, 2021, 8:51 AM Dr. Jan-Philip Gehrcke < @.***> wrote:

This is now released as https://pypi.org/project/gipc/1.3.0/ :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jgehrcke/gipc/pull/110#issuecomment-885694493, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQMQPDP75JL5S3FWEV55NSDTZF6XJANCNFSM455RYSGA .