Hooks that run once instead of everywhere

The current model for hooks, in the presence of MongoDriver, is that hooks are run on every server (actually every Bosk instance). This makes them good for control over the state of a replica set, with each replica performing the desired action.

However, if the cluster is controlling something external, we'd often want something different: hook actions that run on one of the servers, rather than all of them. This would effectively act like a load balancer, spreading the actions across the servers in the cluster, as opposed to having all actions run on all servers.

I think to make this work, we might benefit from a new extension point besides BoskDriver. That is a pretty good extension point for change ingestion, but less so for change notification. The trouble is that hook execution is currently tied directly to the grafting operations that update the internal in-memory bosk state tree. If we want a hook to run on just one server, our only option is to graft changes to only that server's state tree, which is a terrible idea.

Ideally, the semantics would be:

All servers receive all updates and graft them into their state tree
Instead of all servers running the hooks, just one server would run the hook
- "Exactly once" might be difficult. We might get a way with "At least once, and usually just once"
The system should be capable of ensuring that every hook runs at least once even in the presence of node failures

Some initial thoughts:

To get "at least once" semantics, we would need a mechanism to make the hook execution request durable at change submission time. In some way or another, it seems like the BoskDriver will probably be involved. One possibility would stack a Kafka driver on top of the Mongo driver, with the Kafka driver queuing a message indicating that a particular update has occurred; Kafka's message delivery policies (consumer groups) could be configured to establish policies for which servers are notified of which updates. Only those server(s) would run the corresponding hook (and only if/when the change actually arrives!). If there's a failure of a node, or the hook crashes in some way, retries would be possible, again using Kafka's ordinary message delivery mechanisms/policies.

There's a lot to figure out here.

Such a mechanism would need access to information at several points:

It would need to be aware of every submitted change. Otherwise, if a change is submitted (say to MongoDriver) and acknowledged, then all nodes die before the change event arrives, no hook would run, violating the at-least-once requirement.
It would need to be aware of changes that are actually submitted to the local driver. Otherwise, it might run hooks for changes that did not actually happen. (This is probably tolerable if it happens occasionally, but we don't want tons of hooks running for no reason on, say, conditional updates that didn't actually occur.)
It would need access to the grafted state tree, because that's what the hook's ReadContext requires.

Requiring access to every submitted change necessitates a KafkaDriver layer that would need to be on top of MongoDriver, because if MongoDriver were on top, then the broadcast has already happened, and every node would end up submitting messages to Kafka. But that KafkaDriver can't know whether the change was actually applied; for example, it can't tell whether a conditional update's precondition will be satisfied.

Also, the Kafka message and the MongoDB change event would arrive independently. The Kafka message could easily arrive first (in fact, that might be the common case), and such a message can't immediately run the hook because the corresponding state update hasn't been grafted yet; but it can't just wait for the state update, because there might not be one! We could imagine a system that responds to the Kafka message by setting up a mechanism so that if/when the corresponding update arrives, run the hook; but such a system also cannot commit the message until either (1) the update arrives and the hook runs, or (2) we determine that there is no update.

venasolutions / bosk

Hooks that run once instead of everywhere #109