kquick / Thespian

Python Actor concurrency library
MIT License
189 stars 24 forks source link

Large latency #51

Closed cl-dev closed 4 years ago

cl-dev commented 4 years ago

Hi,

My application, which is built entirely on top of Thespian, exhibits large latency when sending messages across between actors. From the time an external input is read by one of the actors (from a hardware sensor) to the time an action is taken by the last actor in a chain of 6 actors, the total latency can add up to 2 minutes. I've timed the individual processing of the messages by the actors and that's quick, which leaves the Actor System as the most likely culprit. Roughly 95% of the total processing time is spent by the Actor System doing its thing, it seems.

Before running the application I start the director, using the TCP Base. The logs don't shown any red flags. The hardware is decent (Intel i7, 4 cores), running Win 10, Python 3.7.5 and the latest Thespian.

Is this sluggishness expected or there's something else I should look into? I tried debugging Thespian itself, but it's very difficult given its async nature.

Many thanks.

kquick commented 4 years ago

This does not sound typical. One thing people often encounter is that messages aren't sent or received while the recieveMessage method is running. You need to exit that to allow message propagation.

Sorry for brevity and delays, I'm on vacation.

cl-dev commented 4 years ago

Hi Kevin,

Thank you so much for taking the time to reply while on holiday.

You need to exit that to allow message propagation.

This gave me a hint on what might be causing the problem. Actor 1 reads sensor data from a REST service stream using the requests III library. The interface to the stream is an iterator which block while waiting for new data. I didn't quite know how to make this play well with the actor model, so I came up with a hacky solution by which the actor sends itself a Poll message after successfully getting new data from the stream. The downside of this is approach is that receiveMessage takes an arbitrary amount of time to complete, which can exceed 1min sometimes. Since actors run each on an independent instance of the Python runtime I thought other actors wouldn't be affected. Maybe that's not the case.

I'll refactor Actor 1 into an ordinary Python application that send messages to Actor 2 and see if that helps. I'll let you know how it went.

Thanks a lot and enjoy your holiday. -- César

kquick commented 4 years ago

If the read from the rest service has a socket that you can get the FD for, the thespian "watch" functionality can be used to wake that actor up when there is data for it to process. This is the most efficient method, but requires that you can get the FD from the iterator.

The thespian"wakeup" can be used to periodically schedule wakeup messages that would let you pick the input if there is a good way to do that polling in a non blocking mode.

I haven't looked at requests III but I'm plan to when I return.

cl-dev commented 4 years ago

It's fixed. Removing the blocking call from the actor brought the end-to-end latency down to ~1.5sec, which is well within the expected limits.

Thanks a lot for your help.

kquick commented 4 years ago

That's good news, I'm glad you got it working as expected. Please do follow-up if you have more issues or suggestions.