kroxylicious / kroxylicious

An open-source network proxy framework for Apache Kafka
https://kroxylicious.io
Apache License 2.0
80 stars 40 forks source link

High availability #54

Open gunnarmorling opened 1 year ago

gunnarmorling commented 1 year ago

(Via @tombentley)

Consider a three broker cluster (B1, B2, B3), fronted by a two node reverse proxy (P1, P2), and an example client (C1). Let's use subscripts to denote the broker identity associated with a proxy instance.

  graph LR;
      C1-->P1("P1₁₂");
      C1-->P2("P2₃");
      P1-->B1;
      P1-->B2;
      P2-->B3;

Because there are fewer proxy nodes than brokers, necessarily one of them is proxying >1 broker (in this case P1 is proxying B1 and B2, which we denote by P1₁₂).

The broker cluster doesn't know anything about the proxies. In particular, if P1 crashes there will not be a leader election for partitions accessed by the clients via P1. From C1's PoV, while it can discover (via Metadata request to P2) that the leader of topic partition TP0 is B1, its notion of B1 is tied to P1. So availability is lost.

This means that any naive static assignment of proxies to brokers breaks client HA. This is even true for a 1:1 mapping.

However, for a 1:1 mapping we can ameliorate the situation by making B1's membership of the cluster predicated on P1's availability.

  graph LR;
      C1-->P1("P1₁");
      C1-->P2("P2₂");
      C1-->P3("P3₃");
      subgraph domain3;
      P3-->B3;
      end
      subgraph domain2;
      P2-->B2;
      end
      subgraph domain1;
      P1-->B1;
      end

If Pn is in the same pod as Bn then those failures which affect the whole pod (machines failures, for example) will mean that the loss of Pi implies the loss of Bi. This is an improvement, but imperfect since failures might effect only the proxy process.

We can do better by having each proxy instance intermediate the broker-quorum communication in addition to the client-broker communication. In this case if the proxy crashes then the broker will naturally be fenced and new leader elected using the standard Kafka protocols, even if the broker itself remains up.

Alternatively we could arrange Bi to do heartbeating of their respective Pi, and exit the process when hearbeat responses stop.

All these fixes require a tight coupling between the broker cluster and the proxy nodes, either in terms of deployment constraints (same pod) or brokers having knowledge of the proxies (broker-quorum intermediation, heartbeating).

But what of N:M toplogies, or more decoupled deployments?

The motivating problem assumed a static assignment of broker identities to proxy nodes. Obviously if P1 crashes, C1 can ask P2 for metadata and that metadata could associate B1's identity with P2 (i.e. P1₁₂₃).

  graph LR;
      C1-. disconnection .->P1;
      C1-->P2("P2₁₂₃");
      P2-->B1;
      P2-->B2;
      P2-->B3;

This implies the existence of a clustering mechanism for proxy nodes, so that P2 could react in this way. The Kafka group coordination protocol could be used as the basis for this.

robobario commented 2 months ago

triaged, should review and potentially break it down further