onemoredata / bagger

Massive log storage in PostgreSQL
BSD 2-Clause "Simplified" License
12 stars 1 forks source link

Investigate Kafka replication strategy #29

Open einhverfr opened 11 months ago

einhverfr commented 11 months ago

In this replication strategy, each Bagger storage instance gets a Kafka topic with one partition. We use Schaufels to write to these partitions. We then use other Schaufels to write from these partitions to PostgreSQL instances and their clones.

Obvious problems include the fact that Kafka only scales to so many topics per cluster so we would have to be able to scale this to many inbound kafka clusters as well.

If this seems feasible, then we should write a spec and consider implementing

einhverfr commented 11 months ago

So one major challenge for this is the fact that consumer groups don't get to be used for synchronization. We'd probably need to separate the topics by generation actually so that there would be a clear end point to the data loaded. Then in that case one would need to stop/restart when it came to the end of the now-defunct topic.