Apache Kafka and the Rise of Event Streaming Platforms

... in the context of building and running cloud-native systems.

Apache Kafka in a Nutshell

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds (events). Kafka can connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.

Wikipedia

Remarkable quote:

Events are everywhere: A business is a series of events and the reactions to those events

Jay Kreps, Co-Creator of Apache Kafka

Why this should be an episode?

Cloud-native systems and applications are inherently distributed and thus need to communicate over the network. To decrease coupling and increase availability and throughput, instead of REST/(g)RPC asynchronous messaging can be used between the actors in the distributed system. But Apache Kafka is more than yet another message broker. Apache Kafka is built on decades of learnings in database research as well as real-world experiences from LinkedIn. Its architecture heavily differs from typical message brokers to build highly scalable and decoupled distributed systems of any kind. Lately, Apache Kafka has emerged into an event-streaming platform revolutionizing the world of (big) data processing and real-time analytics.

Who is this episode for?

The use cases of Apache Kafka primarily target business and enterprise application developers/architects, e.g. for building modern real-time event streaming systems and application integration. But Apache Kafka does not run in the thin air. Apache Kafka itself is a distributed system and thus requires a skilled operations team to keep the platform (business) running. Thus, the episode should provide an introduction to Apache Kafka, why it was built and how its architecture differs from typical message brokers. This knowledge gives one of the many reasons to why Apache Kafka has become a core building block in many modern distributed systems. We should also provide context on the changing landscape of "Big Data" systems and why there is a shift happening from traditional data lakes (silos) towards streaming of information flows through the organization, enabled by platforms using/powered by Apache Kafka.

vmware-tanzu / thepodlets

Apache Kafka and the Rise of Event Streaming Platforms #94