nodefluent / kafka-streams

equivalent to kafka-streams :octopus: for nodejs :sparkles::turtle::rocket::sparkles:
https://nodefluent.github.io/kafka-streams/
MIT License
830 stars 111 forks source link

KTable State Persistence #86

Open dm261395 opened 6 years ago

dm261395 commented 6 years ago

I am having a problem using KTables.

I have a KTable reading from a topic, and I need to compare all incoming key-value messages with the latest stored key-value message with the same key. I need the state, which maintains the latest stored message for every key in the KTable, to persist when the topic changes, or when the application stops and is restarted again.

What I am trying to do is possible and a common use case of the Java Kafka Streams library. I notice that when trying to use KTables in node kafka-streams, no new Kafka internal topics are created in my cluster to maintain the KTable internal state (Kafka should create internal topics for KTable state).

How can I get Kafka to create the internal topics? Is there something I need to change about my Kafka config when creating the KTables, e.g. to ensure a consistent application ID?

Do I need to pass a KStorage argument to the KTable constructor? How would it be persisted in between restarting my application?

Is the LastState class relevant here?

I would appreciate any guidance in solving this and answering some of my questions.

Thanks.

krystianity commented 5 years ago

Hi there @dm261395 and sorry for my late reply.

Indeed the current KTables implementation in this lib has the exact flaws you pointed out.

Both points have historical reasons.. as we mainly needed a fast and stable streaming solution when I built this lib a while ago. Feel free to open any PRs regarding KTables or Windows.

Regards, Chris

JRGranell commented 5 years ago

Hi @krystianity,

Would the backup through topic you describe in your second point be possible with just changes to this lib? Would there anything from the underlying libs (such as 'librdkafka') that would need to support it?

Thanks

krystianity commented 5 years ago

Hi @JRGranell the backup is just based on producing to a topic and re-consuming back from that in case of re-starts etc. there is no need for changes on the low-level clients for that. It requires a bit of deeper Kafka knowledge, as it can be a bit tricky to get single instances in a shared deployment to consume entire topics to rebuild the KTable state in memory or in the DB; however personally I see this as definitely possible to accomplish in the lib.

coderroggie commented 5 years ago

@krystianity how many :beers: would it take for you to implement the persistence in this lib? Persistence would allow for us to scale up our microservices when using this lib... without it it seems rough to use in high volume situations.

:smile:

krystianity commented 5 years ago

Hi @coderroggie for now I would suggest to fall back to node-sinek and use NConsumer (with manual batch mode) and NProducer for distributed high volume situations.

I simply dont have the time and focus to build persistency based on a solid KTable implementation for this lib at the moment - as we have a lot of new challenges that I have to tackle with updates to other libs first.