tarungka / wire

Wire is a distributed open source stream processing framework with powerful stream and batch-processing capabilities.[UNDER DEVELOPEMENT]
GNU General Public License v3.0
1 stars 1 forks source link

Integration of Raft with BadgerDB #54

Closed tarungka closed 1 week ago

tarungka commented 1 week ago

Pull Request: Integration of Raft with BadgerDB for Wire Project

Summary

This PR proposes the integration of the Raft consensus algorithm with BadgerDB. As Wire is a distributed stream processing system, this combination offers robust and scalable coordination for distributed nodes and enhances data persistence performance.

Motivation

Wire requires a reliable mechanism for leader election, fault tolerance, and data replication to handle distributed workloads efficiently. The Raft consensus algorithm is a well-known solution for these distributed systems and would be a perfect fit to manage cluster coordination within Wire. Raft ensures consistency across replicated logs, making it ideal for managing distributed state changes in real-time processing.

BadgerDB, an embeddable key-value store optimized for fast storage, complements Raft’s need for persistent storage by providing efficient data storage and retrieval. By integrating BadgerDB with Raft, Wire can enhance both its performance and reliability for stateful stream processing.

Benefits

  1. High Availability and Fault Tolerance: Raft ensures that even in the case of failures (e.g., node crashes or network issues), Wire remains consistent by replicating the system’s state across multiple nodes.
  2. Scalable Coordination: With Raft’s leader-based consensus, Wire can efficiently manage tasks such as node orchestration and distributed job scheduling across clusters.
  3. Optimized Performance: BadgerDB offers high write throughput and low read latency, ideal for real-time stream processing in distributed systems.
  4. Persistence: Raft requires durable log storage for state changes and snapshots, making BadgerDB a suitable choice for fast, persistent storage, even under heavy workloads.

Proposed Changes

  1. Raft Integration:

    • Implement the Raft consensus algorithm for managing distributed system states in Wire.
    • Handle leader election, log replication, and cluster management using Raft.
  2. BadgerDB Integration:

    • Use BadgerDB for persistent storage of Raft logs, snapshots, and metadata.
    • Optimize BadgerDB configuration for high-throughput stream processing workloads in Wire.
  3. API Changes:

    • Expose new APIs to query Raft-related data (e.g., leader status, logs) and access state snapshots.

Use Cases

  1. Leader Election: Raft will handle the process of electing a leader in the Wire cluster, ensuring one node is responsible for managing system state and distributing tasks to worker nodes.
  2. Cluster Coordination: Raft will manage adding or removing nodes in the Wire cluster, enabling scalability and high availability.
  3. Data Replication and Persistence: With BadgerDB as the underlying storage engine, Raft will ensure that data is durably persisted across nodes in case of failures, while still offering fast access to critical system state information.
tarungka commented 1 week ago

51 and #47 partally adds these features.

Major features relating to integration on badgerDB along with exposing additional REST API's are yet to be completed.