Handle possible race condition when using ExecutorEventEmitter

jfhbrook / pyee

A rough port of Node.js's EventEmitter to Python with a few tricks of its own

https://github.com/jfhbrook/public

MIT License

371 stars 39 forks source link

Handle possible race condition when using ExecutorEventEmitter #76

Closed forslund closed 4 years ago

forslund commented 4 years ago

We've had an error report in Mycroft with an exception when removing a once handler.

Been trying to reproduce and I think it's a race condition. We have a sequence which sets up a once listener waits for 5 seconds for the listener to be called. If the handler is called it returns immediately if the 5 seconds passes without the handler being triggered the listener is removed.

Sometimes (quite rarely) the handler gets called, 5 seconds times out and removes the listener. THEN the once wrapper tries to remove the listener.

A try/except block may be a simple solution to remove the exception log

def _wrapper(f):
            def g(*args, **kwargs):
                try:
                    self.remove_listener(event, f)
                except KeyError:
                    # log warning or something
                    pass  # or possibly return

                # f may return a coroutine, so we need to return that
                # result here so that emit can schedule it
                return f(*args, **kwargs)

            self._add_event_handler(event, f, g)

A proper solution would be to have a lock around the critical section(s)

jfhbrook commented 4 years ago

Thanks for the report! I'll try to take a closer look this weekend.

jfhbrook commented 4 years ago

~Which class is this? Is this the BaseEventEmitter or the ExecutorEventEmitter?~ read the topic

jfhbrook commented 4 years ago

(it might not matter, it sounds like the move is to make the base EE thread-safe anyhow)

forslund commented 4 years ago

Yeah might be just as well to handle it in the BaseEmitter.

I'll see if I can get some spare time and clobber together a suggestion...

jfhbrook commented 4 years ago

mulling this over, I think it might not be a locking issue - like it sounds like what's happening is:

you set the once, which adds a handler to remove the listener not after it's fired but after the async handler completes
the event fires at t=0, kicking off an async handler
the async action is incomplete at t=1 and a timeout occurs, triggering a manual handler removal
the async action completes at t=2, triggering an automated handler removal of a handler that's already been removed

so while I could probably add a lock to the BaseEventEmitter around some of the fiddly bits (that's a good idea regardless), I think the answer here is to make the once handlers robust against the handler being removed out-of-band prior to the once's cleanup, and/or have the once remove its own listener prior to the async handler completing. something like that.

forslund commented 4 years ago

Yes that's pretty much what I think is happening as well.

I see two ways of making it more robust,

One: the simple try / except above Two: Making explicit checks that event and f still exist in the event before removing the listener. This will require locking since check + pop cannot be atomic (afaik)

Edit: Since you've got much more insight than me in the architecture here, maybe you have a better idea?

jfhbrook commented 4 years ago

Yeah no, we're of similar minds here. I mulled over making sure the remove runs on the same tick as the trigger, rather than waiting for the callback to complete - I think this is more correct anyway and might ninja it in there, as a digression - but you should still be able to trigger this for a synchronous call.