Subscription sharding - Githubissues

lbertenasco commented 11 years ago

On several distributed computing scenarios, processing should be delivered specifically to one of several alternative processing engines.

It would be fine if subscription could be done in a sharding mode, where the host decides which engine from the subscription pool notify when triggering the Endpoint.

xaviervia commented 11 years ago

For example:

Sending

A special Header is created, the Shard Header, in which the Emitter specifies the amount of Shards to receive this Dispatch.

Binding

An Engine Binds itself remotely as one Shard in a Shard Pool. In the Shard Header it specifies "1".

A lot more to solve but this provides a framework.

xaviervia commented 11 years ago

Also: the shard should be able to notify saturation levels.

Imagine that you are implementing a distributed computation service for encrypted transaction validation. The actual computation is performed by a N-sized array of CPUs, connected to a cluster manager via JSTP. Each node is subscribed as a 1 unit shard to the POST Computation endpoint, that gets triggered each time a customer asks for a computation. The cluster manager will distribute the request to an available node, but there are several possible scenarios here:

The Round Robin

In this scenario, each node is equivalent and each request is also equivalent (it takes the same amount of time and resources). The sole purpose of distributing the triggering if to make use of the pool, but not organically so. In this setup, the cluster manager may simply distribute the requests to each node one at a time, iterating from host to host for each request.

This is a huge advance over the simple "send to all" default, but it lacks flexibility

The Availability Driven Distribution

In this scenario, each node can perform one and only one request at a time. Processing the requests may take an arbitrarily long amount of time, and nodes can be dissimilar in characteristics, but it is crucial for they not to be able to perform more than one request at any given time. Nodes can thus Release their subscriptions once a request is received. Once the task is finished, nodes can re bind to the endpoint in the cluster manager.

A variation of this scenario is for the nodes to bind themselves with a different Shard multiple according to their specifics. In this subscenario, the subscriber will restate the number of available shards in each transaction.

Organic load balancing with status reports

In this scenario, cluster managers reversely subscribe to status reports in the nodes once the nodes subscribe to the cluster manager. This has not to be Protocol Level so there is strictly no need to specify this.

Using those status reports, the cluster manager distributes organically the requests so to maximize the efficiency in each host resources usage.

While thinking this final example I came to doubt the wisdom is trying to implement this as part of the protocol specification, I guess a cluster manager application will do just fine.

xaviervia commented 11 years ago

In fact, Reverse Subscription may simplify this enormously.

xaviervia commented 11 years ago

Subscription Sharding is left to 0.7. Reverse Subscription solves most of this issue either way.

jstp / jstp-rfc

Subscription sharding #3

The Round Robin

The Availability Driven Distribution

Organic load balancing with status reports