Open hamburml opened 3 months ago
Our Kafka support does not support snapstart or CRAC. How Kafka works makes it very hard to snapshot it. I would recommend, for safety reasons, to only initialize after the restore.
Thanks for your reply! There is an initial https://github.com/apache/kafka/pull/13619 which tries to handle CRaC but it looks like there is not that much interest.
I would recommend, for safety reasons, to only initialize after the restore.
Exactly, this is what I want. I do not need a snapshot of a working kafka client, I need a method to call on the kafka client so that it reconnects and/or verifies current connections. This would remove old connections which are gone (because they were there during the snapshot) and create a new. Can you point me to a method which I could call in the afterRestore method?
You cannot use reactive messaging, but you can create a low-level Kafka client in the afterRestore, or create a lazy producer and not use it during the snapshot phase (so basically, initialize it during the first HTTP call)
Hm yeah, but I still want to use this dependency...
If it's only to produce, you can use the lazy feature (@ogunalp it should delay the initialization of the producer right?)
Indeed, I forgot about the lazy-client
flag. It should work for producers. And maybe even for consumers combined with pausable-channels, but I need to check.
Thanks, I'll try it with lazy-client
and come back to you.
lazy-client worked! Thanks. Snapshot is created without a kafka connection, so there is no exception anymore.
Sorry, closed it. @ozangunalp mentioned a test with pausable-channels.
@hamburml is there a test repository that you can share?
I suspect that Injecting a @Channel
would work, because the subscription is lazy too. But consuming in an @Incoming
channel would still create the client at startup.
@ozangunalp not right now. I try to prepare one this evening. My service only writes into a @Channel
and it works now without exceptions :) But I can not prepare it with working CRaC because I was not able to get a JDK which creates a snapshot working on my end. I let AWS Lambda do this for me.
Describe the bug
copy from https://github.com/quarkusio/quarkus/issues/42286
maybe here is a better place :)
Hi,
we use snapstart on our quarkus lambdas. Some of them use smallrye-messaging to write or receive messages from a kafka. This works as expected unfortunately in our logs we have some warnings that the connection to a kafka node was lost either to auth error or firewall blocking.
Afaik during the init phase the whole memory of a started quarkus lambda is stored and when the lambda is reused reloaded into the memory to skip the init phase. That also means that pooled connections are "stored" but in reality are already closed.
Now I thought i simply need to close all open kafka connections before the snapshot is created. I did this with a org.crac.Resource and the beforeCheckpoint method. Now the warnings in the log are gone but it looks like no new connections are initiated and therefore all messages send via a channel fail. I also used KafkaProducer::flush but that didnt help.
Any ideas?
I found https://github.com/quarkusio/quarkus/issues/31401 which is the same issue but with database connections.