Question – Producing synchronously

lud commented 2 years ago

Hi,

I found here that you are comparing erlkaf:produce/4 to brod:produce_sync/5. But if I understand well, erlkaf does not produces synchronously, but rather return immediately and sends a delivery report back.

Please correct me if that is wrong. If it is correct, is there a special support for blocking until the message is delivered ? Or should I just wait the delivery report?

Thank you.

silviucpp commented 2 years ago

When you run the test you need to use {queue_buffering_overflow_strategy, block_calling_process}.

How erlkaf producer works is:

It buffers the messages in a certain timeframe (5 ms default) configured via queue_buffering_max_ms up to the following limits :

100000 messages max (queue_buffering_max_messages) or
1048576 KB (queue_buffering_max_kbytes)

And send them in batch for a better throughput.

In base the queue overflows it has 3 methods of handling the situation (queue_buffering_overflow_strategy):

block_calling_process - block the calling process till the message can be queued again
local_disk_queue (default) - queues the events on the local disk and flushes them when memory queue has space again. For example if the broker goes down you don't loose the messages. Those are stored on the local disk and when broker comes online are flushed.
drop_records - messages are dropped if the queue is full

Note: The memory buffer queue is shared by all topics and partitions.

Kind regards, Silviu

lud commented 2 years ago

Hi @silviucpp ,

Thank you for your answer.

If I understand well, queue_buffering_overflow_strategy applies when the memory queue is full, and block_calling_process makes the calling process wait until there is room in the queue.

But If I want the process to block until Kafka acknowledges the production of the message, that is, when the memory queue was flushed and actually produced, then there is no base support for that, and I should receive the delivery report, right ?

silviucpp commented 2 years ago

No ! And In my opinion there is no real use case for what you want in the real world. Maybe I'm wrong. Doing this its very hard to scale.

lud commented 2 years ago

No there is no support for that, or no I should not use receive? Both I guess :)

My use case is that I want to be sure that some event has been stored in Kafka before moving on because that event matters and the producing function will not be called again so it should crash if it cannot produce to Kafka.

What would you do in that case?

silviucpp commented 2 years ago

You can install the delivery callback for errors only and you will receive over there all the messages that failed to be sent. Once you receive such event you can store it somewhere else and resend it.

Otherwise you need to block your calling process using "receive" to wait for the delivery callback for that specific message. But don't expect high scalability with this kind of approach. You might better rethink your logic.

In theory as I told you if message cannot be pushed in kafka is stored in memory and when kafka comes online it's pushed. if memory gets full too fast they are stored on the disk.

Silviu

lud commented 2 years ago

Thank you for the clarifications.

I guess I should just rely on erlkaf then. I did not know that it would keep the queue when disconnected. That's great.

Cheers!

silviucpp / erlkaf

Question – Producing synchronously #38