Closed JonahCalvo closed 10 months ago
@JonahCalvo , This is an exciting feature. Thanks for putting this proposal together.
What is the workers
property? The Data Prepper pipeline worker reads from the Buffer
. And it can have multiple threads which are defined in the pipeline. So it seems that each request to the Buffer
should be happening on one of those threads, not in another "worker" thread.
@JonahCalvo, We can also simplify the buffer name. Use kafka
instead of kafka_buffer
.
This feature is completed by quite a few PRs.
Hi everyone,
Can somebody give us a working sample configuration of Kafka Buffer to test it? Because I tried with the example in the first post, but is not working, :-(
From my tests and reading the logs:
"topics": ["topicname"]
Thanks for your help!
Hi everyone,
Can somebody give us a working sample configuration of Kafka Buffer to test it? Because I tried with the example in the first post, but is not working, :-(
From my tests and reading the logs:
- The correct name of the buffer plugin is "kafka", not "kafka_buffer".
- The "acknowledgments" flag not exist.
- the "topic" flag not exist, and the correct flag is "topics", and supposedly you need to define it as an array, but is not working with
"topics": ["topicname"]
Thanks for your help!
Answering myself, the correct way to config the Kafka Buffer is this one:
buffer:
kafka:
bootstrap_servers:
- redpanda-0:9092
topics:
- name: dns-ip-pipeline-buffer
group_id: data-prepper-group
Regards!
Use-case
Currently, the only buffer available with Data Prepper is the
bounding_blocking
buffer, which stores events in memory. This can lead to data loss if a pipeline crashes, or the buffer overflows. A disk based buffer is required to prevent this data loss.This proposal is to implement a Kafka buffer. Kafka offers robust buffering capabilities by persistently storing data on disk across multiple nodes, ensuring high availability and fault tolerance.
Basic Configuration
The buffer will:
Sample configuration
The configuration will be similar to that of the Kafka source and sink. Notably, only one topic will be provided, and
serde_format
will not be configurable, as the buffer will read and write bytes. Attributes that were previously set for each topic, such as workers, will be made attributes of the plugin, rather than topic.Detailed Process
RawByteHandler
interface. This interface will include adeserializeBytes()
function, which the Kafka buffer will callback to when reading data.Encryption
The Kafka buffer will offer optional encryption via KMS:
GenerateDataKeyPair
API will be invoked to obtain a data key pair.Encrypt
API will then encrypt the private key, which will be sent to Kafka alongside the encrypted data.Decrypt
API will decrypt the private key, which will then decrypt the data.Metrics
The Kafka buffer will incorporate the standard buffer metrics, as well as the metrics reported by Kafka Source/Sink: