kquick / Thespian

Python Actor concurrency library
MIT License
189 stars 24 forks source link

Thespian with openCV #50

Closed rbotafogo closed 2 years ago

rbotafogo commented 4 years ago

Hello,

I´m trying to use Thespian with openCV and I would appreciate some advice on the architecture.

I need to develop a video processing application that works with more than 20 cameras simultaneously on a single machine. My idea was to create parallel actors that would create, let´s say, 10 new threads each, one to read each camera. So, for 20 cameras there would be 2 parallel actors and 20 threaded actors. After reading the data, each feed goes through a process of object counting and tracking using Tensorflow and will require more parallel and threaded actors (but this is not really relevant here).

According to the 'in_depth' manual: "There can be bases for implementing Actors as separate processes, or separate threads, or simply sequential execution", but I could not find any base for separate threads.

The alternative seems to be to create a separate process per camera, but I´m afraid this might consume to many processes... can this be an issue, considering that many more processes might be required for the whole system.

Do you have any advice on how to proceed? Thanks a lot!

kquick commented 4 years ago

The Actor model is a concurrency model alternative to threads and processes. It's certainly possible to mix the two, but that can also lead to complications due to slightly different approaches to concurrency management. You might have better results sticking to just one model, although Thespian does support multi-threaded actors.

Since most conventional OS implementations provide a thread and process model, the Thespian Actor model is implemented "on top" of the underlying OS concurrency implementations; this can be contrasted with implementations like Erlang where the only concurrency functionality provided by the Erlang VM is the Actor model (although the VM itself must make the necessary adaptation to the services provided by the OS). Thespian currently provides two models: the simpleSystem model, which is essentially "cooperative scheduling" where all the actors run in the context of a single process, single thread, and variations of the multiProcess base where each actor is its own process.

The simpleSystem model is useful for testing and debugging, and simple separation of concerns programming, but doesn't provide the concurrency that is usually desired, so one of the multiProcess bases (usually multiprocTCPBase) should be used for enabling concurrency.

As you saw in the "In Depth" documentation, actors could conceivably be implemented as separate threads within the same process. In practical terms, when I've attempted this implementation in the past it has created significant concerns because threads are "leaky" with respect to the intents of the Actor model. Notably, each Actor is intended to run independently and not share any resources with other Actors unless those resources are explicitly passed via a message. The threads model is almost the inverse of this: all threads share common resources (memory, file descriptors, global variables, etc.). Additionally, IO operations are typically blocking and process global; there are ways to work around the blocking aspects of IO with more complicated implementations, but the process global aspects are more difficult to address. I have been unable to determine a good way to implement Actor isolation using threads. As a result, there is no current Thespian implementation that uses thread-based Actors.

My suggestion would be to try to use one Actor per camera, with some additional Actors to coordinate activities and perform higher-level processing (the Actor model often works best with single-concern implementations, so the camera-managing Actors should probably just manage the camera but not perform any other activities). I don't know what OS you are using, but under Linux, threads are essentially processes anyhow, so while there is some differences in resource consumption, it's probably not going to be significant. I have run hundreds of actors on "average" machine configurations without seeing significant difficulties. At least one advantage of this approach is that the Actor model will help keep the code concerns separated so that if the number of processes does become an issue and you need to try something like threading within the Actors, the code should already be self-contained and be easier to implement for a multi-threaded approach.

rbotafogo commented 4 years ago

Hello Kevin,

Thanks for this detailed explanation. I´m following your suggestion and having one actor per camera. It took me a while to be able to start working on it, but now things are moving along.

I have a couple of more questions and would again appreciate your advice:

  1. Actors often need to synchronize their execution. This can be done with a callback function, but this is kind of ugly and makes coding a bit weird. Is there any good way of synchronizing execution in the actor model? Do you have any experience to share?
  2. After creating an actor I sometimes need to initialize some variables. For example, I have an actor that reads the camera. The actor is created, and then I call an 'initialize' method that sets some variables such as the name of the camera, the path of the data, etc. Then, after calling the 'initialize' method, I call the 'run' method, to start processing the video file. Many times I´m getting an error from the 'run' method saying that the camera name is non existent. But other times this works fine. So, I guess that the 'run' method is being called before the 'initialize' method, even though the messages are
    1. createActor
    2. initialize
    3. run

Em sáb., 23 de nov. de 2019 às 16:19, Kevin Quick notifications@github.com escreveu:

The Actor model is a concurrency model alternative to threads and processes. It's certainly possible to mix the two, but that can also lead to complications due to slightly different approaches to concurrency management. You might have better results sticking to just one model, although Thespian does support multi-threaded actors.

Since most conventional OS implementations provide a thread and process model, the Thespian Actor model is implemented "on top" of the underlying OS concurrency implementations; this can be contrasted with implementations like Erlang where the only concurrency functionality provided by the Erlang VM is the Actor model (although the VM itself must make the necessary adaptation to the services provided by the OS). Thespian currently provides two models: the simpleSystem model, which is essentially "cooperative scheduling" where all the actors run in the context of a single process, single thread, and variations of the multiProcess base where each actor is its own process.

The simpleSystem model is useful for testing and debugging, and simple separation of concerns programming, but doesn't provide the concurrency that is usually desired, so one of the multiProcess bases (usually multiprocTCPBase) should be used for enabling concurrency.

As you saw in the "In Depth" documentation, actors could conceivably be implemented as separate threads within the same process. In practical terms, when I've attempted this implementation in the past it has created significant concerns because threads are "leaky" with respect to the intents of the Actor model. Notably, each Actor is intended to run independently and not share any resources with other Actors unless those resources are explicitly passed via a message. The threads model is almost the inverse of this: all threads share common resources (memory, file descriptors, global variables, etc.). Additionally, IO operations are typically blocking and process global; there are ways to work around the blocking aspects of IO with more complicated implementations, but the process global aspects are more difficult to address. I have been unable to determine a good way to implement Actor isolation using threads. As a result, there is no current Thespian implementation that uses thread-based Actors.

My suggestion would be to try to use one Actor per camera, with some additional Actors to coordinate activities and perform higher-level processing (the Actor model often works best with single-concern implementations, so the camera-managing Actors should probably just manage the camera but not perform any other activities). I don't know what OS you are using, but under Linux, threads are essentially processes anyhow, so while there is some differences in resource consumption, it's probably not going to be significant. I have run hundreds of actors on "average" machine configurations without seeing significant difficulties. At least one advantage of this approach is that the Actor model will help keep the code concerns separated so that if the number of processes does become an issue and you need to try something like threading within the Actors, the code should already be self-contained and be easier to implement for a multi-threaded approach.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kquick/Thespian/issues/50?email_source=notifications&email_token=AA6QP4NXGPDLKWN7LZWYBU3QVF63NA5CNFSM4JQC2X2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE73Z3I#issuecomment-557825261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6QP4PH5DZ2ZQGLM772UBLQVF63NANCNFSM4JQC2X2A .

-- Rodrigo Botafogo

rbotafogo commented 4 years ago

Oooops.... the last message was sent incomplete... sorry

As I was saying the order of execution does not seem to be respected. Is this expected behavior? If this is the case, then to initialize an actor, do I have to send the initialization message, then wait for a reply by this actor and then send the run message? This seems like a lot of work and boilerplate, do you have any recommendations also for that?

Thanks for your help.

Em qua., 11 de dez. de 2019 às 19:16, Rodrigo Botafogo < rodrigo.a.botafogo@gmail.com> escreveu:

Hello Kevin,

Thanks for this detailed explanation. I´m following your suggestion and having one actor per camera. It took me a while to be able to start working on it, but now things are moving along.

I have a couple of more questions and would again appreciate your advice:

  1. Actors often need to synchronize their execution. This can be done with a callback function, but this is kind of ugly and makes coding a bit weird. Is there any good way of synchronizing execution in the actor model? Do you have any experience to share?
  2. After creating an actor I sometimes need to initialize some variables. For example, I have an actor that reads the camera. The actor is created, and then I call an 'initialize' method that sets some variables such as the name of the camera, the path of the data, etc. Then, after calling the 'initialize' method, I call the 'run' method, to start processing the video file. Many times I´m getting an error from the 'run' method saying that the camera name is non existent. But other times this works fine. So, I guess that the 'run' method is being called before the 'initialize' method, even though the messages are
    1. createActor
    2. initialize
    3. run

Em sáb., 23 de nov. de 2019 às 16:19, Kevin Quick < notifications@github.com> escreveu:

The Actor model is a concurrency model alternative to threads and processes. It's certainly possible to mix the two, but that can also lead to complications due to slightly different approaches to concurrency management. You might have better results sticking to just one model, although Thespian does support multi-threaded actors.

Since most conventional OS implementations provide a thread and process model, the Thespian Actor model is implemented "on top" of the underlying OS concurrency implementations; this can be contrasted with implementations like Erlang where the only concurrency functionality provided by the Erlang VM is the Actor model (although the VM itself must make the necessary adaptation to the services provided by the OS). Thespian currently provides two models: the simpleSystem model, which is essentially "cooperative scheduling" where all the actors run in the context of a single process, single thread, and variations of the multiProcess base where each actor is its own process.

The simpleSystem model is useful for testing and debugging, and simple separation of concerns programming, but doesn't provide the concurrency that is usually desired, so one of the multiProcess bases (usually multiprocTCPBase) should be used for enabling concurrency.

As you saw in the "In Depth" documentation, actors could conceivably be implemented as separate threads within the same process. In practical terms, when I've attempted this implementation in the past it has created significant concerns because threads are "leaky" with respect to the intents of the Actor model. Notably, each Actor is intended to run independently and not share any resources with other Actors unless those resources are explicitly passed via a message. The threads model is almost the inverse of this: all threads share common resources (memory, file descriptors, global variables, etc.). Additionally, IO operations are typically blocking and process global; there are ways to work around the blocking aspects of IO with more complicated implementations, but the process global aspects are more difficult to address. I have been unable to determine a good way to implement Actor isolation using threads. As a result, there is no current Thespian implementation that uses thread-based Actors.

My suggestion would be to try to use one Actor per camera, with some additional Actors to coordinate activities and perform higher-level processing (the Actor model often works best with single-concern implementations, so the camera-managing Actors should probably just manage the camera but not perform any other activities). I don't know what OS you are using, but under Linux, threads are essentially processes anyhow, so while there is some differences in resource consumption, it's probably not going to be significant. I have run hundreds of actors on "average" machine configurations without seeing significant difficulties. At least one advantage of this approach is that the Actor model will help keep the code concerns separated so that if the number of processes does become an issue and you need to try something like threading within the Actors, the code should already be self-contained and be easier to implement for a multi-threaded approach.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kquick/Thespian/issues/50?email_source=notifications&email_token=AA6QP4NXGPDLKWN7LZWYBU3QVF63NA5CNFSM4JQC2X2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE73Z3I#issuecomment-557825261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6QP4PH5DZ2ZQGLM772UBLQVF63NANCNFSM4JQC2X2A .

-- Rodrigo Botafogo

-- Rodrigo Botafogo

kquick commented 4 years ago

Hi @rbotafogo ,

I think that the answer to both of your concerns is effectively the same: you will need to exchange explicit messages to synchronize/verify actor states.

For the first case where you need to synchronize actors: each actor is running asynchronously from the others, so if they need to synchronize at different points the only communications mechanism they have for effecting that synchronization is to exchange one or more messages. Actor A would run until it needed to synchronize with Actor B at which it would send a message to the latter. When Actor B reached the synchronization point, it would exit from it's current operation. When Actor A's message arrives (either having already been sent, or when it is sent in the future) then Actor B's receiveMessage will be called with that message. It's likely that Actor B will then want to send a confirmation message back to Actor A to resume processing in the latter.

With respect to individual messages, they are normally sent in-order, but there are some conditions where the order of messages between Actors might change. This is in part due to the Thespian best-effort delivery operation where it attempts to ensure that the entire system is not halted due to the transmission loss of a single message. The downside to this is that if there is a specific ordering (as with your init and run messages), you will need to explicitly ensure that they are processed in the desired order.

The good news in the latter case is that there is a thespian/initmsgs.py import module that can help you with this. This module has not yet been documented in the primary documentation, but there is a good bit of module-specific documentation in the beginning of that file that should provide enough information to use it (please let me know if it doesn't!). Using this also provides assistance in the case where an actor exits and must be re-initialized to be used properly.

Happy to help, and let me know if I can answer any additional questions.

-Kevin

kquick commented 4 years ago

Note: the following was posted to issue #47, but I believe it was intended to be posted as additional information for your questions above. It doesn't change my answer, but I've moved it here from issue #47 and removed it from the latter to maintain appropriate context. Feel free to let me know if this was not your intent.

-----snip-------------------------------------------------------------------------------------

Here is the log of the code I`m executing:

As you can see, I call 'initialize', then 'test', then 'run'

Then process 12548 gets the 'run' message: " DEBUG:root:15:10:17, 12548, got a Memo: run" this causes the error saying that 'video_name' does not exist.

Then the same process 12548 get: "DEBUG:root:15:10:18, 12548, got a Memo: initialize" and then "DEBUG:root:15:10:18, 12548, got a Memo: test"

Thanks for any help you can provide

====

INFO:root:calling initialize INFO:root:calling test INFO:root:calling run DEBUG:root:15:10:17, 12548, got a Memo: run WARNING:contagem.decoder.video_manager.VideoManager:Actor contagem.decoder.video_manager.VideoManager @ ActorAddr-(T|:55706) retryable exception on message <contagem.ipc.memo.Memo object at 0x0000024796C139C8> Traceback (most recent call last):

File "C:\cygwin64\usr\local\lib\Python37\lib\site-packages\thespian\system\actorManager.py", line 163, in _handleOneMessage actor_result = self.actorInst.receiveMessage(msg, envelope.sender)

File "D:\rbotafogo\desenv\contagem\contagem\ipc\doer.py", line 196, in receiveMessage ret = method(*message._args, **message._kwargs)

File "D:\rbotafogo\desenv\contagem\contagem\decoder\video_manager.py", line 161, in run vd = self.hire(self.video_name, 'contagem.decoder.video_decoder.VideoDecoder',

AttributeError: 'VideoManager' object has no attribute 'video_name'

DEBUG:root:15:10:17, 12548, got a Memo: run ERROR:contagem.decoder.video_manager.VideoManager:Actor contagem.decoder.video_manager.VideoManager @ ActorAddr-(T|:55706) second exception on message <contagem.ipc.memo.Memo object at 0x0000024796C139C8> Traceback (most recent call last):

File "C:\cygwin64\usr\local\lib\Python37\lib\site-packages\thespian\system\actorManager.py", line 163, in _handleOneMessage actor_result = self.actorInst.receiveMessage(msg, envelope.sender)

File "D:\rbotafogo\desenv\contagem\contagem\ipc\doer.py", line 196, in receiveMessage ret = method(*message._args, **message._kwargs)

File "D:\rbotafogo\desenv\contagem\contagem\decoder\video_manager.py", line 161, in run vd = self.hire(self.video_name, 'contagem.decoder.video_decoder.VideoDecoder',

AttributeError: 'VideoManager' object has no attribute 'video_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "C:\cygwin64\usr\local\lib\Python37\lib\site-packages\thespian\system\actorManager.py", line 178, in _handleOneMessage actor_result = self.actorInst.receiveMessage(copy.deepcopy(msg), envelope.sender)

File "D:\rbotafogo\desenv\contagem\contagem\ipc\doer.py", line 196, in receiveMessage ret = method(*message._args, **message._kwargs)

File "D:\rbotafogo\desenv\contagem\contagem\decoder\video_manager.py", line 161, in run vd = self.hire(self.video_name, 'contagem.decoder.video_decoder.VideoDecoder',

AttributeError: 'VideoManager' object has no attribute 'video_name'

Level 1:tensorflow:Registering FakeQuantWithMinMaxArgs (<function _FakeQuantWithMinMaxArgsGradient at 0x0000018B295EF168>) in gradient. Level 1:tensorflow:Registering FakeQuantWithMinMaxVars (<function _FakeQuantWithMinMaxVarsGradient at 0x0000018B295EF288>) in gradient. DEBUG:root:15:10:18, 12548, got a Memo: initialize INFO:root:+++++++++++++++++++++++++++++++ INFO:root:VideoManager initializing with video_name Shopping3 DEBUG:root:15:10:18, 12548, got a Memo: test INFO:root:------------------------------------------------------ INFO:root:VideoManager initialized with video_name Shopping3 -- Rodrigo Botafogo

rbotafogo commented 4 years ago

Hello Kevin,

Thanks again for your thoughtful answer. I guess I´ll just have to get used to coding in this new style. I´ll take a look at the intmsgs.py.

I´ll let you know how development with Thespian evolves.

Rodrigo

Em qui., 12 de dez. de 2019 às 19:49, Kevin Quick notifications@github.com escreveu:

Hi @rbotafogo https://github.com/rbotafogo ,

I think that the answer to both of your concerns is effectively the same: you will need to exchange explicit messages to synchronize/verify actor states.

For the first case where you need to synchronize actors: each actor is running asynchronously from the others, so if they need to synchronize at different points the only communications mechanism they have for effecting that synchronization is to exchange one or more messages. Actor A would run until it needed to synchronize with Actor B at which it would send a message to the latter. When Actor B reached the synchronization point, it would exit from it's current operation. When Actor A's message arrives (either having already been sent, or when it is sent in the future) then Actor B's receiveMessage will be called with that message. It's likely that Actor B will then want to send a confirmation message back to Actor A to resume processing in the latter.

With respect to individual messages, they are normally sent in-order, but there are some conditions where the order of messages between Actors might change. This is in part due to the Thespian best-effort delivery operation where it attempts to ensure that the entire system is not halted due to the transmission loss of a single message. The downside to this is that if there is a specific ordering (as with your init and run messages), you will need to explicitly ensure that they are processed in the desired order.

The good news in the latter case is that there is a thespian/initmsgs.py import module that can help you with this. This module has not yet been documented in the primary documentation, but there is a good bit of module-specific documentation in the beginning of that file that should provide enough information to use it (please let me know if it doesn't!). Using this also provides assistance in the case where an actor exits and must be re-initialized to be used properly.

Happy to help, and let me know if I can answer any additional questions.

-Kevin

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kquick/Thespian/issues/50?email_source=notifications&email_token=AA6QP4MUSV6WUDWJR6XD3BDQYK5XXA5CNFSM4JQC2X2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGYJTZQ#issuecomment-565221862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6QP4MJUGMSMELQHPEQ5GLQYK5XXANCNFSM4JQC2X2A .

-- Rodrigo Botafogo

kquick commented 3 years ago

Any updates, @rbotafogo ?