silviucpp / erlkaf

Erlang kafka driver based on librdkafka
MIT License
84 stars 41 forks source link

question: Does erlkaf handle producer failures? #59

Closed ding-an-sich closed 1 year ago

ding-an-sich commented 1 year ago

Due to the async nature of the libdrkafka API, I understand that the only way to guarantee the producer is sync would be handling the delivery reports and blocking the producer waiting for the callback to return.

My question is, do we have other retry mechanisms, like some kind of auto-retry, or the only way to handle errors is for the user to implement such a callback?

silviucpp commented 1 year ago

Erlkaf is accepting a limited number of pending requests (queue_buffering_max_messages and queue_buffering_max_kbytes). In case you push more requests than this number then there are 3 cases:

  1. events are queued into a local persistent queue and once the number of pending requests goes under that limit are automatically sent to the broker.
  2. an error is returned to the calling process
  3. the calling process is blocked till the the number of pending requests goes under the limit

This behavior is decided by your queue_buffering_overflow_strategy config.

You can adjust that queue_bufferingmax* to lower values but this will hurt from a performance perspective.

The only way to loose events is when your client instance is terminated (vm crash, os shutdown/restart/ application stopped, etc) and there are pending requests. This pending requests are lost in case you don't have a mechanism to mark events as delivered based on the delivery report.

Even if you shutdown the producer gracefully all pending requests not received by the broker are lost because at the time I wrote this erlcass librdkafka didn't provided a way to get the pending requests back and to queue them into my local persistent queue. https://github.com/silviucpp/erlkaf/blob/master/src/erlkaf_producer.erl#L160

Now that librdkafka issue is fixed so probably one day I will fix this erlkaf behaviour as well. PR is welcome