rsimmonsjr / axiom

Implementation of a highly-scalable and ergonomic actor model for Rust
Other
184 stars 21 forks source link

Why Not Unbounded Channels? #141

Open zicklag opened 4 years ago

zicklag commented 4 years ago

Hey @rsimmonsjr, if you don't mind the discussion I would like to know what would be the reasons having only bounded channels if you could have the choice between allowing people to use unbounded channels or not.

My concern with bounded channels, probably driven by not having much experience designing actors, was a simple situation where I have a search tool looking for items and sending notifications when it finds items that match the criteria it is looking for.

I had one actor that was responsible for sending a notification for any items that it received from other actors that were responsible for doing the actual searching. My concern was that if I had a limit on the number of items in the channel going to the notification actor, what would happen if there was a surge in messages from the crawlers and some of the notifications were not sent because of send timeouts and the channel filling up?

My biggest concern is that it could work fine for one user and then it could start failing unexpectedly when one user gets a higher than average load and the actor sends start timing out. How am I supposed to know how many messages an actor may receive if the number of messages is dependent on factors such as workload?

I could see limited channel sizes being used to maybe to cause intentional back-up if you wanted to limit the amount of throughput to an actor, but what if that is not desired?

Like the doc for these two kinds of Actor errors:

/// Used when unable to send to an actor's message channel within the scheduled timeout
    /// configured in the actor system. This could result from the actor's channel being too
    /// small to accommodate the message flow, the lack of thread count to process messages fast
    /// enough to keep up with the flow or something wrong with the actor itself that it is
    /// taking too long to clear the messages.
    SendTimedOut(Aid),

    /// Used when unable to schedule the actor for work in the work channel. This could be a
    /// result of having a work channel that is too small to accommodate the number of actors
    /// being concurrently scheduled, not enough threads to process actors in the channel fast
    /// enough or simply an actor that misbehaves, causing dispatcher threads to take a lot of
    /// time or not finish at all.
    UnableToSchedule,

It seems to me that these errors are very based on work load and available resources. But what should my app do if I get these errors. Most likely I would want to retry the message right?

Maybe I'm just over-reacting, but it feels like this is something that could happen unexpectedly in production based on the use when maybe the actor system should be resilient enough to take this into account itself. I'm not sure.


Anyway, you don't have to reply if you don't have time. :smile: I'm wanting to know whether or not I should give the option for unbounded iterators in maxim which may or may not have any direct benefit to Axiom so I understand if this isn't worth the effort.

rsimmonsjr commented 4 years ago

Managing flow of messages through actors is one of them most fundamental design activities of actor development. If you have an actor that is getting an overloaded inbox then there is likely something wrong with the message flow and you should redesign. For example you could use a pool of actors. In axiom the channel is implemented with SECC which is bounded but then you can set those bounds very high if you wish. I had given thought to making SECC be able to grow but I haven't gotten to that task. At any rate that would now be such an advantage to actors themselves because all you get is a backlog of traffic rather than a smooth flow.

zicklag commented 4 years ago

At any rate that would now be such an advantage to actors themselves because all you get is a backlog of traffic rather than a smooth flow.

You mean that would not be such an advantage, right? So the goal is to keep the channel from overloading so that you get low-latency reactions to messages that you send and keep the system responsive as opposed to always lagging behind what is actually happening and not keeping up with the work as it comes in. That makes sense.

For example you could use a pool of actors.

Cool! That was my other thought and one of the motivations behind the Aid pools PR ( #139 ). I wasn't sure if an actor pool would be considered an anti-pattern or not, but I thought it made sense when you compared it to thread pools.

I'm thinking it would still be good to be able to have unbounded channels when you choose to as a fail-safe in case there is a surge in load and you are OK with the work getting backed up if it can't be handled fast enough.

If you create an actor pool with an actor for every thread on the system and it is still backing up, then it is it could just be something that takes time and you want to queue the work anyway. At that point you are probably saturating the network/disk/cpu. ( maybe? I'm not really that experienced so I'm guessing ) As long as you know that is what might happen and you are intentionally setting it up that way.


So I'm currently working on a skip enable channel based on flume. It is really simple and I think it is going to work. It's got a couple of bugs at the moment related to skipping while resetting, but I think its simple to fix. It has both bounded and unbouned channels, but it is only single consumer, which I think is all I need because only one actor should ever listen on a the channel. Also it has an async receiver for efficiently handling incoming messages. You can see the current code here. The implementation is under 200 lines thanks to flume!

I'm also migrating the whole actor System to use the smol executor and so far I think it is going very well. I'm getting rid of pretty much all manual threading and migrating to async tasks which should greatly simplify the code and reduce what needs to be maintained. It will also be easy to update the network cluster support to be completely async as well which should improve performance.

Also, with even the extra dependencies on smol and flume we still only have 109 dependencies compared to axiom's current 90 dependencies. ( but Axiom can be brought down to about ~80 dependencies I think by deduplicating a couple of dependencies and updating some of them )

BTW, if you are interested I'd be glad to open PRs to axiom for these changes when they are finished.

zicklag commented 4 years ago

There, I just finished the implementation of a bounded and unbounded async MPSC SECC channel that I should be able to use for the actor messages ( the tests are passing now ):

https://github.com/katharostech/maxim/blob/smol-executor/src/secc.rs