Distributed dynamic graph .

jvmlet commented 8 months ago

First of all - thanks for such an interesting library. Question : imagine distributed system in which trees are born dynamically on several nodes. Tree branches might overlap (branch of tree A is basicly a tree B), or tree C shares the branch of tree D. Trees A, B, C and D compose distributed graph. Does API allow to plugi-n/support distributed locking of evaluation of nodes? Distributed notification of completion event? Basically, I'm after the integration of this library and masstransit saga pattern implementation. Is it possible to modify the tree (insert node in the middle ) dynamically? What about persisting the graph definition?

timonkrebs commented 8 months ago

Dynamically modifying the dependency graph is one of the strengths of MemoizR. There are several ways to do so. I will add examples for more advanced cases to modify the tree/graph at runtime.

Distributed locking/etc. of evaluation of nodes is not yet supported. But that will be the next concern I plan to work on.

Persisting the graph definition is something that this model has advantages over things like ReactiveX. But there are multiple ways how you could do it. And depending on your usecase, you would need it differently. What are the use cases you would like to handle with persisting the graph definition? There are already possibilities to persist the graph definition. But they require some manual steps at the moment.

timonkrebs commented 8 months ago

I am considering a Bloom Clock based locking strategy.

@jvmlet could you describe your use case in a little more detail. What would you want to build with it? Do you already have something similar that you could show?

jvmlet commented 8 months ago

Hi @timonkrebs I'm looking for infrastructure with graph CRUD functionality and ability to traverse the graph while evaluating each vertex (triggering some external processing) in distributed manner.

All aspects of these is pluggable :

Persistence storage API
Caching API
Locking API ()
Notification API
Traversal strategies , in/out of orders. dfs/bfs /custom.

I think saga pattern is the closest to all these ....

timonkrebs commented 8 months ago

@jvmlet I think of the strength of this library more in terms of state synchronization. But the core should also be able to support a distributed transaction model that I think you are looking for.

I would like to let the core be as generic as possible to also allow to build other aspects on top of it. That is why I named it MemoizR as this core concept allows for a lot of aspects to be built (pluggable) on top.

Combined with something like the masstransit saga pattern implementation most of the aspects you are looking for should already be possible. But as I mentioned most require some amount of manual work and are not (yet) supported directly by this library. For example traversal strategies would be possible to implement in a pluggable way if you serialize the graph. I am considering to provide a serialization API but this is not yet on any roadmap. If you are interested in that I would be open to contributions.

One thing I would still be interested is how you think of locking. Would you like to lock entire subgraphs? Would you need to explicitly lock them? What would be the use case for that?

jvmlet commented 8 months ago

@timonkrebs , given the simple DAG

Imagine that processing of node 0 and 1 triggered by some external events that happened almost at the same time on different nodes. By processing I mean some heavy operations that have dependencies and they are modeled by DAG. The obvious thing to do is to save last processing time of each vertex together with other attributes (QOS, ect). If traversal started by 1 reached 2 after 0, I want to wait for it's result and not go dipper than 2. This is my use case for locking, but my suggestion is to expose some sort of ITraversalContext that defines API for traverse strategy, notification, and locking if needed :

interface ITraversalContext<V,E> {
  delegate bool OnVisit (ITraversalContext ctx, V node); // bool to continue/stop traversal
  ILock Lock {get;set;}
  ITraverseStartegy  {get;set;}
 // other
}

timonkrebs commented 8 months ago

The obvious thing to do is to save last processing time of each vertex together with other attributes (QOS, ect).

I do not think that this is obvious: Lamport Timestamp

Thus two timestamps or counters may be the same within a distributed system, but in applying the logical clocks algorithm events that occur will always maintain at least a strict partial ordering.

I think a Decentralized locking strategy is the best way to do it. As I mentioned I am considering a Bloom Clock based locking strategy.

It could be done quite like you mentioned it by waiting at 2 for both inputs to update. But in the general case there is no guarantee that both 1 and 0 will trigger. It is valid when only one will trigger. Therefor it would be better in the case where both trigger, that the first evaluation gets canceled if the second evaluation is triggered before the first is finished. There is already an implementation of a denounce and delay operator in place that can be used to only get the latest value after some time. But it probably will be useful to be able to define nodes that have to update together and therefore only trigger after all of them have been triggered. I think something like the Zip operator would be nice.

I think exposing OnVisit in some sort of ITraversalContext is really interesting. I will consider that.

Thank you for your inputs. Unfortunately at the moment I have other responsibilities that are quite time consuming and I would like to put out a complete release of the core functionality. But having a clearer picture of how I would like to evolve the library will help make decisions on how to implement the current functionality. I plan to work on this as soon as the core functionality is finished.

timonkrebs / MemoizR

Distributed dynamic graph . #31