organix / uart

micro actor run-time
MIT License
1 stars 1 forks source link

Recreating an Actor's state at a point in time #1

Open ghost opened 11 years ago

ghost commented 11 years ago

First apologies for posting here, but I can't find any forums for engaging in general Q&A regarding the pure actor model.

Are you aware of such a venue?

My immediate question which I can't find an answer for, is in the pure actor model is the behavior of an actor deterministic, given the initial actor state upon creation and a fixed set of input events. (I am assuming that on each rerun of the event stream that the events arrive in the same order, although the initial order would have been indeterminate).

I'm thinking back to CSP, where over time a process can be defined by it's state graph or by the event stream over the same period of time.

The reason for asking is that I want to be able to recreate an actors state at any point in time (to support process restart and timeline replay), and the information I currently have is the event stream of events arriving at the actor and the events generated by the actor. If an actors behavior is deterministic, I can recreate it's state at any point by re-instantiating a new actor in the same state and replaying the input event stream.

If an actors behavior is non deterministic for the same initial state and event stream I would need to persist the actors state after each event has been processed.

Andy

dalnefre commented 11 years ago

I am unaware of an appropriate forum for this question, so you are welcome to raise the issue here. After considering your question, I find it difficult to answer directly. So I will do my best to engage in a conversation, from which both of us with hopefully find opportunities to learn.

I don't believe that the formal Actor Model requires determinism of the kind you describe. For example, I could imagine a low-level actor interface to an internal clock/counter that responds to #read requests by returning a timestamp. This would clearly return different results on replay, regardless of message ordering. Perhaps even more important is the fact that message arrival order is specifically non-deterministic. Therefore your assumption (that you could re-run an event stream to produce the same arrival order) would require explicit constraints imposed by your implementation, contrary to the assumptions of the model.

From a mechanism perspective, I don't understand why it would be more time/space efficient to persist events rather than persist actor "state". The current "state" of an actor captures the effective history of events it has received, so it seems like actor "state" would require at most the same storage as events, usually much less. In addition, there are some advantages to allowing different behavior when retrying a message. Sometime this can avoid transient-failure conditions, where a retry is allowed to succeed rather than fail "deterministically". The Ken protocol allows for this possibility, treating a VAT as a single Actor.

One concept to consider is that the actor model does not assume a single global "state" for an actor configuration. Ken is an example of fault-tolerance based on persisting the state of individual (larger grained) actors, with some help from the communication infrastructure.

I hope this is in some way helpful. Thanks for your thoughtful questions.

Dale

ghost commented 11 years ago

Dale

Hi, thank you for your insightful response.

My question was partly motivated out of already having a persisted event stream.

I have a data layer that stores messages as they arrive, the messages consist of deltas to a data model, and the model can be rolled forward and backwards through (logical) time by applying or undoing the deltas.

I was not entirely convinced that the CSP event stream approach could be applied to the pure Actor Model, your examples immediately highlight some basic flaws with that approach.

I'll explore storing actor state in the message stream.

Andy

dalnefre commented 11 years ago

Andy,

It sounds like you're taking a consistent-global-state viewpoint, using the event stream to mark changes-in-time. One advantage of actors is that there need not be any globally-consistent viewpoint, yet the system as a whole maintains consistency. Each actor is maintained in a consistent state based on the messages it has received. This makes it much easier to distribute and scale the system.

It is perfectly reasonable to have event streams as part of your design, but I'm not sure you want to centralize that in the infrastructure. Instead, you may want to consider publish/subscribe subjects, each "streaming" events independently.

Finally, I'm not sure I understand what you mean by "storing actor state in the message stream". Actor state is hidden within the actor (a fundamental property of the model) and can only be "observed" through how the actor responds to events.

One example on my blog is the protocol for character streams. Each stream responds to a #read request with a character value and an actor representing the next position in the stream. In many implementations, each actor is distinct and can be re-used to re-read the stream from any position. This is equivalent to the "state" of the stream at any particular point in "time".

Another example is that some implementations of a simple "dictionary" respond to #bind requests by returning an actor representing the dictionary with the new binding, which is distinct from the dictionary that received the request. This means that each binding creates a new dictionary state, but all previous references are still valid, referring to previous states without such bindings.

You may also find inspiration in Functional Data Structures [1], which can be used as a mechanism for storing actor state, current and historical.

Dale

[1] C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998.

ghost commented 11 years ago

Dale

Hi, you have zoned in to all the key points, most of which I am still wrestling with.

But here is some more detail which I think addresses some of your points.

The persistent event stream is not a single global state, it sits over an online/offline P2P sync Data Layer.

Different peers way write different messages and have received a different subset of messages from other peers, creating a unique local dataset.

Eventually the messages will/may be replicated to all peers in a group, but this is not assumed to be the case.

Inconsistency is a first class concept, and is handled using a distributed implementation of operational transform based on a revision graph (a bit like git branch/merge graph).

For data this is working well, each peer has it's own local view, that is similar to the other group members views and sometime (if all messages have been received by all peers in a group) the same.

I want to now inject code into the data stream, that can respond to and update the data stream. As each peer has a different data set the code may generate different output, but if two peers have the same dataset the code should generate the same output.

My current attempt is using the Actor model, I think this is a good choice for a distributed environment. But it maybe that I'm trying to apply the model inappropriately or the right solution is something else, or I need to change my implementation approach, I'm learning a lot along the way.

In the current implementation actor state is persisted through the the same stream layer but in it's own stream, this is still replicated between peers. I think I need to allow actors to have independent state on peers but somehow still have actor instances on different peers generate the same data output given the same data input.

Locally actor message order can be indeterminate, what I want is determinism in the way data that is being processed. Taking a different example, if I'm updating a bank balance, I always want the ending balance to be the same regardless of the message order between the actors.

Thank you for the references, they are proving excellent reading.

Andy

ghost commented 10 years ago

Dale

I thought I'd post an update.

After thinking about your responses, I realised I was breaking the actor model by tying my implementation too closely with the persistent data stream.

In the implementation I have now a new actor and it's behaviours may be written to the stream, any peer reading the stream may instantiate the actor in a local configuration. The configurations on different peers run independently so actor state will diverge across peers depending on their local copy of the data stream and other local actor state. Actor state may be saved to a local stream to support application suspend/resume but not replay.

Message passing is only supported between actors in the same local configuration. Remote message passing is something I need to revisit but will no longer support message replay, it will just focus on delivery.

I'll post some more details about the implementation once it is code stable.

Andy