jet / kafunk

Kafunk: F# Kafka client
https://jet.github.io/kafunk/
Other
160 stars 63 forks source link

Fault tolerance hierarchies #142

Closed eulerfx closed 6 years ago

eulerfx commented 7 years ago

A Kafunk consumer and producer consist of a few independently running processes, which despite running independently have a relationship. Take for example the consumer. It consists of a heartbeat process, as well as a fetch process. An error code returned to the heartbeat process should cause the fetch process to stop and restart. An error code returned to the fetch process indicating a topology change should only cause the fetch process to restart after fetching metadata. These relationships also exist at lower levels. For example, in the TCP channel module, a the TCP receive process mail fail because of a TCP connectivity error or a closed connection. This should cause the TCP channel to attempt re-connection, eventually escalating the error to the Kafka cluster connection.

Async.StartChild starts a process from within an existing process. The processes are linked via shared CancellationToken, however failure of the child process does not cause the paren't process to fail. Likewise, failure of the parent process does not propagate to the child. A proposed extension to this is here.

The proposal consists of types:

/// A node represents a site where computations can occur.
/// It can be linked to other nodes, such that errors on one cause errors in the other.
type Node ~= IVar<unit>

val node : unit -> Node
val startChild : Node -> Async<unit> -> Node
val tryAsync : Node -> Async<'a> -> Async<'a>

/// A process is a node-dependent computation.
type Proc<'a> ~= Node -> Async<'a>

/// Example
let go (a:Async<'a>) = proc {
  // evaluates an async computation in the context
  // of the ambient node
  let! a = a

  // starts a linked async computation
  let! t = Proc.startChildAsTask (async { printfn "hello" })

  let! node = Proc.node
  do! Proc.fail (exn("oh no"))

  return 1 }

Another pattern is that of NodeState<'s> and a node-state-dependent operation currently expressed as Resource. A node state may be a Socket instance and a dependent operation may be a read from a socket Socket -> ArraySegment<byte> -> Async<int>. The result of the operation may indicate that the node must be restarted or failed.

We borrow design principles from the Erlang supervision hierarchy.