No way to limit message queue when Kafka is down?

helpshift / ekaf

A minimal, high-performance Kafka client in Erlang.

https://engineering.helpshift.com

Other

165 stars 50 forks source link

No way to limit message queue when Kafka is down? #7

Closed qrilka closed 9 years ago

qrilka commented 9 years ago

I see max_buffer_size but it has a different meaning than just maximum buffer size (rather maximum number of async messages to buffer). We had Kafka outage and that resulted in our server crashing with OOM because messages just kept collected in memory. Should we have some workaround in ekaf for this problem? I.e. some "total_buffer_size" limiting number of messages kept in memory (ingoring other messages if they appear after hitting that limit). Normally Kafka should be very reliable but that is used for logging only in our system so it make sense to keep working even if logging has some problems (but warning about that should be issued of course). Any opinion on this?

bosky101 commented 9 years ago

I can see how adding a max_downtime_buffer_size option can help.

Integrating a test for this will not be trivial, but I have something in mind. Will give you a branch with the fix anyway.

qrilka commented 9 years ago

Thanks

bosky101 commented 9 years ago

The option ekaf_max_downtime_buffer_size has been implemented in https://github.com/helpshift/ekaf/commit/416d3004533c277938c25da12e250328be5e31ad Added tests as well. Merged in master. Pushed to tag https://github.com/helpshift/ekaf/releases/tag/1.5

You can set the value like so

  application:set_env(ekaf, ekaf_max_downtime_buffer_size, 5)

You can subscribe to this event (eg: for alerting) by adding a callback like

%-include("ekaf_definitions.hrl").
application:set_env(ekaf, ?EKAF_CALLBACK_MAX_DOWNTIME_BUFFER_REACHED, {?MODULE, callback})

See ekaf_demo.erl and test/ekaf_tests.erl for more on callbacks, ekaf options.

qrilka commented 9 years ago

That should resolve memory overflow problems but because of problems described in https://github.com/helpshift/ekaf/issues/6 we'll disable ekaf in our production