kquick / Thespian

Python Actor concurrency library
MIT License
189 stars 24 forks source link

Race conditions with troupes and global actors #75

Closed erplsf closed 3 years ago

erplsf commented 3 years ago

Greetings,

Here's our usecase - we have a troupe which receives the messages with items of different types, and dispatches them to the corresponding processors (which can be global actors, if they depend on some internal state for processing). The dispatcher creates the actors based on what messages come to it, and we started noticing that sometimes global actors are created more than once, and then deleted by ActorSystem without us explicitly requesting it. It doesn't cause issues in our current implementation but raises questions - why does it happen, can it happen later, and what if some of this "ghost" actors receive our messages?

I suspect there's a race condition involved somewhere. What points us in that direction - if we create global actors once in sequential manner in main thread, they're later resolved correctly and created only once.

Here's the smallest example I crafted to reproduce this behaviour:

from thespian.actors import ActorSystem, Actor, ActorTypeDispatcher
from thespian.troupe import troupe
import time

@troupe(idle_count=1, max_count=10)
class TroupeActor(Actor):
    def receiveMessage(self, message, sender):
        self.createActor(GlobalActor, globalName="global")

class GlobalActor(ActorTypeDispatcher):
    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
        print("in GlobalActor __init__")

    def receiveMsg_ActorExitRequest(self, message, sender):
        print("stopping GlobalActor")

if __name__ == "__main__":
    ActorSystem('multiprocTCPBase').shutdown()
    acs = ActorSystem('multiprocTCPBase')
    troupe = acs.createActor(TroupeActor)
    try:
        while True:
            for _ in range(100):
                acs.tell(troupe, "message")
            time.sleep(1)
    finally:
        acs.shutdown()

If you run this example a few times you'll see the output messages like this:

in GlobalActor __init__
in GlobalActor __init__
in GlobalActor __init__
stopping GlobalActor
stopping GlobalActor

Even though we didn't explicitly request GlobalActor to stop! (I'm assuming that global actors are not receiving ActorExitRequest when Actor that created them dies - according to paragraph 3.1 on this page: https://thespianpy.com/doc/using#outline-container-orgd1bc6f9)

Please advise us - if this is an expected behaviour, or we're abusing Thespian in some way.

kquick commented 3 years ago

Thanks for finding this. It was a race condition in the parallel creation of global actors. The latest master commit should resolve this; if you can confirm this works in your larger scenario I will generate a new release with this updated functionality.

kquick commented 3 years ago

Specifically commit https://github.com/kquick/Thespian/commit/9aa5e330426e81e85c194baab867bb5cb949a041

erplsf commented 3 years ago

Tested with 9aa5e33 - couldn't reproduce anymore. Tested with our application - the same results, doesn't seem to occur anymore. Thank you.

kquick commented 3 years ago

Released in version 3.10.5