2024-01-11 - Githubissues

Quick Start step 1: get kafka

step 2: start the kafka environment kafka can be started using Zookeeper or KRaft. To get started with either configuration follow one the sections below but not both.

kafka with KRaft

Generate a Cluster UUID
format log directories
start the kafka server

zookeeper vs KRaft Kafka's architecture has recently shifted from Zookeeprer to a quorum-based controller that uses a new concensus protocol called Kafka Raft, shortened as KRaft. The shift from ZooKeeper to KRaft has been a major architectural overhual, resulting in simplified deployment, enhanced scalability, and improves performance. for more see: https://romanglushach.medium.com/the-evolution-of-kafka-architecture-from-zookeeper-to-kraft-f42d511ba242

step 3: create a topic to store your events Kafka is a distributed event streaming platform that lets you read, write, and process events (also called records or messages in the documentation) across many machines.

example events are payment transactions, geolocation updates from mobile phones, shipping orders, sensor measurments from IoT devices or medical equipment, and much more. These events are organized and stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.

So before you can write your first events, you must create a topic All of Kafka's command line tools have additional options

bin/kafka-topics.sh --create --topic --bootstrap-server localhost:9092 bin/kafka-topics.sh --describe --topic --bootstrap-server localhost:9092

step 4: write some events into the topic A Kafka client communicates with the Kafka brokers via the network for writing (or reading) events. Once received, the brokers will store the events in a durable and fault-talerant manner for as long as you need-event forevet.

write a few events into your topic.

bin/kafka-console-producer.sh --topic quickstart-events --bootstarp-server localhost:9092

Step 5: Read the events Open another terminal session and run the console consumer client to read the events you just created:

Step 6: import/export your data as streams of events with kafka connect Kafka connect allows you to continuously ingest data from external systems into Kafka, and vice versa. It is and extensible tool that runs connectors, which implement the custom logic for interacting with an external system. It is thus very easy to intergrate existing systems with Kafka. To make this process even easier, there are hundreds of such connectors readily available.

example. run kafka connect with simple connectors that import data from a file to a kafka topic and expert data from a kafka topic to a file.

first. add connect-file-3.6.1.jar to the plugin.path property in the Connect worker's configuration. For the purpose of this quickstart we'll use a relative path and consider the connectors package as an uber jar, which works when the quickstart commands are run from the installation directory. however, it's worth nothing that for production deployments using absolute paths is always preferable. see plugin.path for detailed description of how to set this config.

Edit the config/connect-standalone.properties file, add or change the plugin.path configuration property match the following, and save the file:

echo "plugin.path=libs/connect-file-3.6.1.jar" create text file. start two connectors running in standalone mode, which means they run in a single, local, dedicated process. provide three configuration files as parameters

configuration for the Kafka Connect process, containing common configuration such as the Kafka brokers to connect to and the serialization format for data.

specify a connector to create theses files include a unique connector name, the connector class to instanticate, and any other configuration required by the connector.

bin/connector-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

source connector that reads lines from an input file and produces each to a kafka topic.
sink connector that reads messages from a kafka topic and produces each as a line an output file.
test.txt -- source connector -- topic(connect-test) -- sink connector -- test.sink.txt

the connectors continue to process data, so we can add data to the file and see it move through the pipeline

step7 : process your events with kafka streams once your data is stored in kafka as events, you can process the data with the kafka streams client library for java/scala. it allows you to implement mission-critical real-time applications and microservices, where the input and/or output data is stored in kafka topics. kafka streams combines the simplicity of writing and deploying standard java and scala applications on the client side with the benefits of kafka's server-side cluster technology to make these applications highly scalable, elastic, fault-tolerant, and distributed. the library supports excatly-once processing, stateful operations and aggregations, windowing, joins, processing based on event-time, and much more.

to give you a first taste, here's how one would implement the popular WordCount algorithm.

[ exmple ] count word with producer, consumer count word with kafka connector

Step 8 : terminate the kafka environment

stop producer and consumer client
stop kafka broker
stop kafka with zookeeper or KRaft

you can delete any data of your local kafka environment including any events you have created along the way, run the command: rm -rf /tmp/kafka-logs /tmp/zookeepr /tmp/kraft-combined-logs

so3500 / TIL

2024-01-11 #5

Apache Kafka document

Book