silviucpp / erlkaf

Erlang kafka driver based on librdkafka
MIT License
83 stars 41 forks source link

Support for simple ungrouped consumers #66

Open jmschles opened 5 months ago

jmschles commented 5 months ago

I'm looking at switching from Elsa to erlkaf. My use case is to consume entire Kafka topics to populate in-memory caches on application boot. This requires two things from my consumers:

  1. They start from the beginning of the topic every time the application starts, rather than starting from a previously-set offset
  2. They're not part of a consumer group, or at least not part of a common or previously-used one, so each node in the cluster can maintain its own independent cache

I think I could accomplish this with randomly-generated unique consumer groups, but that feels like a hack, so I'm hoping there's a better way. I didn't see one looking through the code but might've missed it.

hikui commented 5 months ago

We have similar requirements. Generating random group id is a bad idea, especially when you have Kafka monitoring setup. You may see a lot of junk consumer groups in the dashboard having lags.

My current workaround is to receive messages using kafka consumer and then broadcast them using erlang's PG. However this invalidates some good traits of using Kafka.

silviucpp commented 4 months ago

Hello,

Unfortunately I'm not planing into the near feature to implement myself features that are not used inside the projects I'm working on where erlkaf it's used. I'm happy to merge any PR.

An workaround to your issue might be storing the offsets locally into a file and not on the broker itself. This way you can delete the file before starting erlkaf and in theory I guess will start consume events from beginning (in case auto_offset_reset is smallest)

{auto_offset_reset, smallest},
{offset_store_path, <<"/path/to/file">>}

I never used offset_store_path to be honest.

jmschles commented 4 months ago

Thanks @silviucpp! If I'm understanding correctly, think that workaround could solve the offset storage issue, but wouldn't address the requirement that each node consumes every message in the topic.

I definitely understand where you're coming from about priorities, and appreciate your taking the time to reply!

silviucpp commented 4 months ago

@jmschles you are totally right ! I'm glad to merge any contribution to the project . erlkaf it's currently used into a solution that handles tens of millions of voice calls every single day. For caching we are using KeyDB which works great for us with such volume. If helps I also have the erlang driver https://github.com/silviucpp/eredis_pool we are using with keydb active-active replaication and multi master.